パーセプトロンでIrisの決定境界をプロット

Coding

code: Python

from sklearn import datasets

import numpy as np

# Irisデータセットをロード

iris = datasets.load_iris()

# 3, 4列目の特徴量を抽出

X = iris.data[:, 2, 3]

# クラスラベルを取得

y = iris.target

# 一意なクラスラベルを出力

print('Class labels:', np.unique(y))

------------------------------------------------------------------------------

Class labels: 0 1 2

------------------------------------------------------------------------------

code: Python

from sklearn.model_selection import train_test_split

# トレーニングとテストデータに分割

# 全体の30%をテストデータにする

X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.3, random_state=1, stratify=y)

# stratify=yを指定したことでトレーニングとテストデータに含まれるクラスラベルの比率が入力データと同じになる

print('Labels counts in y:', np.bincount(y))

print('Labels counts in y_train:', np.bincount(y_train))

print('Labels counts in y_test:', np.bincount(y_test))

------------------------------------------------------------------------------

Labels counts in y: 50 50 50

Labels counts in y_train: 35 35 35

Labels counts in y_test: 15 15 15

------------------------------------------------------------------------------

code: Python

from sklearn.preprocessing import StandardScaler

sc = StandardScaler()

# トレーニングデータの平均と標準偏差を計算

sc.fit(X_train)

# 平均と標準偏差を用いて標準化

X_train_std = sc.transform(X_train)

X_test_std = sc.transform(X_test)

from sklearn.linear_model import Perceptron

# エポック数40、学習率0.1でパーセプトロンのインスタンスを生成

ppn = Perceptron(n_iter=40, eta0=0.1, random_state=1)

# トレーニングデータをモデルに適合させる

ppn.fit(X_train_std, y_train)

# テストデータで予測を実施

y_pred = ppn.predict(X_test_std)

# 誤分類のサンプルの個数を表示

mis_samples = (y_test != y_pred).sum()

print('Misclassified samples: %d' % mis_samples)

# 正解率 = 1 - 誤分類率

print('Accuracy score: %d' % ((1 - mis_samples / 45) * 100))

------------------------------------------------------------------------------

Misclassified samples: 3

Accuracy score: 93

------------------------------------------------------------------------------

code: Python

from sklearn.metrics import accuracy_score

# 分類の正解率を表示

print('Accuracy: %.2f' % accuracy_score(y_test, y_pred))

print('Accuracy: %.2f' % ppn.score(X_test_std, y_test))

------------------------------------------------------------------------------

Accuracy: 0.93

------------------------------------------------------------------------------

決定境界をプロットする

code: Python

from matplotlib.colors import ListedColormap

import matplotlib.pyplot as plt

def plot_decision_regions(X, y, classifier, test_idx=None, resolution=0.02):

# マーカーとカラーマップの準備

markers = ('s', 'x', 'o', '^', 'v')

colors = ('red', 'blue', 'lightgreen', 'gray', 'cyan')

cmap = ListedColormap(colors:len(np.unique(y)))

# 決定領域のプロット

x1_min, x1_max = X:, 0.min() - 1, X:, 0.max() + 1

x2_min, x2_max = X:, 1.min() - 1, X:, 0.max() + 1

# グリッドポイントの生成

xx1, xx2 = np.meshgrid(np.arange(x1_min, x1_max, resolution), np.arange(x2_min, x2_max, resolution))

# 各特徴量を1次元配列に変換して予測を実行

Z = classifier.predict(np.array(xx1.ravel(), xx2.ravel()).T)

# 予測結果を元のグリッドポイントのデータサイズに変換

Z = Z.reshape(xx1.shape)

# グリッドポイントの等高線のプロット

plt.contourf(xx1, xx2, Z, alpha=0.3, cmap=cmap)

# 軸の範囲の設定

plt.xlim(xx1.min(), xx1.max())

plt.ylim(xx2.min(), xx2.max())

# クラスごとにサンプルをプロット

for idx, cl in enumerate(np.unique(y)):

plt.scatter(x=Xy == cl, 0, y=Xy == cl, 1, alpha=0.8, c=colorsidx, marker=markersidx, label=cl, edgecolors='black')

# テストサンプルを目立たせる（点を〇で表示）

if test_idx:

# すべてのサンプルをプロット

X_test, y_test = Xtest_idx, :, ytest_idx

plt.scatter(X_test:, 0, X_test:, 1, c='', edgecolors='black', alpha=1.0, linewidths=1, marker='o', s=100, label='test set')

# トレーニングデータとテストデータの特徴量を行方向に結合

X_combined_std = np.vstack((X_train_std, X_test_std))

# トレーニングデータとテストデータのクラスラベルを結合

y_combined = np.hstack((y_train, y_test))

# 決定境界のプロット

plot_decision_regions(X=X_combined_std, y=y_combined, classifier=ppn, test_idx=range(105, 150))

# 軸のラベルの設定

plt.xlabel('petal length standardized')

plt.ylabel('petal with standardized')

# 凡例の設定（左上に配置）

plt.legend(loc='upper left')

# グラフを表示

plt.tight_layout()

plt.show()

https://gyazo.com/89d798077dd40de79cc5615d520c0a29

このように、完全な線形分離が不可能なデータセットでは、パーセプトロンは収束しません。よって、実務ではあまり用いられません。