ロジスティック回帰を使ってクラスの確率を予測するモデルの構築

Overview

ロジスティック回帰の活性化関数（シグモイド関数）の出力が確率になる理由

https://gyazo.com/a1dc8cff7062356d23a837be035e4cff https://gyazo.com/2a216514705ff7ebd44a7f8a519ed7f6

Coding

ロジスティック回帰（No library）

scikit-learn

code: Python

from sklearn import datasets

import numpy as np

from sklearn.model_selection import train_test_split

from sklearn.preprocessing import StandardScaler

from sklearn.linear_model import LogisticRegression

iris = datasets.load_iris()

X = iris.data[:, 2, 3]

y = iris.target

X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.3, random_state=1, stratify=y)

sc = StandardScaler()

sc.fit(X_train)

X_train_std = sc.transform(X_train)

X_test_std = sc.transform(X_test)

# ロジスティック回帰のインスタンスを生成

lr = LogisticRegression(C=100.0, random_state=1)

# トレーニングデータをモデルに適合させる

lr.fit(X_train_std, y_train)

# 決定領域をプロット

X_combined_std = np.vstack((X_train_std, X_test_std))

y_combined = np.hstack((y_train, y_test))

plot_decision_regions(X=X_combined_std, y=y_combined, classifier=lr, test_idx=range(105, 150))

# 軸のラベルの設定

plt.xlabel('petal length standardized')

plt.ylabel('petal with standardized')

# 凡例の設定（左上に配置）

plt.legend(loc='upper left')

# グラフを表示

plt.tight_layout()

plt.show()

https://gyazo.com/f6ab646b8e732a4236e3cf63c14d3191

テストセットの最初の3つのサンプルの確率は次のように予測できます。

code: Python

# クラスの所属確率を表す

lr.predict_proba(X_test_std:3, :)

--------------------------------------------------------------------------

array([3.20136878e-08, 1.46953648e-01, 8.53046320e-01,

8.34428069e-01, 1.65571931e-01, 4.57896429e-12,

8.49182775e-01, 1.50817225e-01, 4.65678779e-13])

--------------------------------------------------------------------------

code: Python

# クラスラベルの予測値を取得する

# lr.predict_proba(X_test_std:3, :).argmax(axis=1)

lr.predict(X_test_std:3, :)

--------------------------------------------------------------------------

array(2, 0, 0)

--------------------------------------------------------------------------