グリッドサーチによる機械学習モデルのチューニング
グリッドサーチを使ったハイパーパラメータのチューニング
Coding - サポートベクタマシンのパイプランのトレーニングとチューニング
code: Python
import pandas as pd
from sklearn.model_selection import train_test_split
from sklearn.preprocessing import LabelEncoder
# yのカテゴリ変数「M」「B」を数値に変換する
le = LabelEncoder()
y = le.fit_transform(y)
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.20, stratify=y, random_state=1)
from sklearn.pipeline import make_pipeline
from sklearn.preprocessing import StandardScaler
from sklearn.model_selection import GridSearchCV
from sklearn.svm import SVC
pipe_svc = make_pipeline(StandardScaler(), SVC(random_state=1))
param_grid = [{'svc__C': param_range, 'svc__kernel': 'linear'}, {'svc__C': param_range, 'svc__gamma': param_range, 'svc__kernel': 'rbf'}] gs = GridSearchCV(estimator=pipe_svc, param_grid=param_grid, scoring='accuracy', cv=10, n_jobs=-1)
gs = gs.fit(X_train, y_train)
print(gs.best_score_)
print(gs.best_params_)
clf = gs.best_estimator_
clf.fit(X_train, y_train)
print('Test accuarcy: {}'.format(clf.score(X_test, y_test)))
--------------------------------------------------------------------------------
0.9846153846153847
{'svc__C': 100.0, 'svc__gamma': 0.001, 'svc__kernel': 'rbf'}
Test accuarcy: 0.9736842105263158
--------------------------------------------------------------------------------
入れ子式の交差検証によるアルゴリズムの選択
Coding - SVMと決定木を比較する
code: Python
import numpy as np
from sklearn.model_selection import cross_val_score
gs = GridSearchCV(estimator=pipe_svc, param_grid=param_grid, scoring='accuracy', cv=2)
scores = cross_val_score(gs, X_train, y_train, scoring='accuracy', cv=5)
print('CV accuracy: {:.3f} +/- {:.3f}'.format(np.mean(scores), np.std(scores)))
from sklearn.tree import DecisionTreeClassifier
gs = GridSearchCV(estimator=DecisionTreeClassifier(random_state=0), param_grid=[{'max_depth': 1, 2, 3, 4, 5, 6, 7, None}], scoring='accuracy', cv=2) scores = cross_val_score(gs, X_train, y_train, scoring='accuracy', cv=5)
print('CV accuracy: {:.3f} +/- {:.3f}'.format(np.mean(scores), np.std(scores)))
--------------------------------------------------------------------------------
CV accuracy: 0.974 +/- 0.015
CV accuracy: 0.934 +/- 0.016
--------------------------------------------------------------------------------