評価基準とスコア
Overview
2クラス分類の評価結果を表現する方法で、最も包括的な方法の一つとして混合行列があります。
https://gyazo.com/b398427d15b66f90e3247efdb9c73305
計算式は以下のようになります。
$ Accuracy = \frac{TP + TN}{TP + TN + FP + FN}
$ Precision = \frac{TP}{TP + FP}
$ Precision = \frac{TP}{TP + FN}
$ F = 2\cdot \frac{precision \cdot recall}{precision + recall}
Coding
code: python
# 決定木で普通に学習
from sklearn.tree import DecisionTreeClassifier
tree = DecisionTreeClassifier(max_depth=2).fit(X_train, y_train)
pred_tree = tree.predict(X_test)
print('Test score: {:.2f}'.format(tree.score(X_test, y_test)))
# ロジスティック回帰で普通に学習
from sklearn.linear_model import LogisticRegression
logreg = LogisticRegression(C=0.1).fit(X_train, y_train)
pred_logreg = logreg.predict(X_test)
print('logreg score: {:.2f}'.format(logreg.score(X_test, y_test)))
# ダミー分類器で学習
dummy = DummyClassifier().fit(X_train, y_train)
pred_dummy = dummy.predict(X_test)
print('dummy score: {:.2f}'.format(dummy.score(X_test, y_test)))
# 混合行列の結果をみる
from sklearn.metrics import confusion_matrix
print('Most frequent class:')
print(confusion_matrix(y_test, pred_most_frequent))
print('\nDummy model:')
print(confusion_matrix(y_test, pred_dummy))
print('\nDecision tree:')
print(confusion_matrix(y_test, pred_tree))
print('\nLogistic Regression')
print(confusion_matrix(y_test, pred_logreg))
from sklearn.metrics import f1_score
print('f1 score most frequent: {:.2f}'.format(f1_score(y_test, pred_most_frequent)))
print('f1 score dummy: {:.2f}'.format(f1_score(y_test, pred_dummy)))
print('f1 score tree: {:.2f}'.format(f1_score(y_test, pred_tree)))
print('f1 score logistic regression: {:.2f}'.format(f1_score(y_test, pred_logreg)))
--------------------------------------------------------------------------
Test score: 0.92
logreg score: 0.98
dummy score: 0.80
Most frequent class:
Dummy model:
Decision tree:
Logistic Regression
f1 score dummy: 0.11
f1 score tree: 0.55
f1 score logistic regression: 0.89
--------------------------------------------------------------------------
// TODO P279