sklearn - pokutuna

sklearn

学習とテストデータの分割

code:example.py

from sklearn.model_selection import train_test_split

X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.4, random_state=101)

評価関数

線形回帰

code:linear.py

from sklearn.linear_model import LinearRegression

reg = LinearRegression().fit(X_train, y_train)

reg.intercept_ # 切片

reg.coef_ # 係数

pd.DataFrame(reg.coef_, X.columns, columns='Coefficient')

predictions = reg.predict(X_test)

plt.scatter(y_test, predictions) # 散布図で実際の値と予測を見る

sns.displot((y_test-predictions),bins=50); # 残差をヒストグラムで見る、うまくいっていれば正規分布っぽくなるはず

分類指標

混同行列は sklearn.metrics.confusion_matrix 使う

confusion_matrix(y_true, y_pred, labels)

真偽分類なら labels=[True, False] を渡すとよい

生 bool だと False が左側にくる、ソート順が False のほうが先だから?

いろいろまとめて出すやつ