sklearn.model_selection.StratifiedShuffleSplit

Stratified ShuffleSplit cross-validator

Provides train/test indices to split data in train/test sets.

_iter_indicesメソッド

y.ndim == 2のとき、マルチラベルを表す文字列に変換

for multi-label y, map each distinct row to a string repr

例：マルチラベル[0, 0, 0, 0, 1]は"0 0 0 0 1"になる

マルチラベルの時、yの1つ1つの要素は文字列に変換される

その状態でnp.uniqueを取る

classes, y_indices = np.unique(y, return_inverse=True)

code:breakpoint.py

(Pdb) classes # distinctの状態になったyの要素

array(['0 0 0 0 0', '0 0 0 0 1', '0 0 0 1 0', '0 0 0 1 1', '0 1 0 0 0',

'0 1 0 0 1', '0 1 0 1 0', '0 1 0 1 1', '1 0 0 0 0', '1 0 0 0 1',

'1 0 0 1 0', '1 1 0 0 0', '1 1 0 0 1', '1 1 0 1 0', '1 1 0 1 1',

'1 1 1 1 1'], dtype='<U9')

(Pdb) y_indices # y0はclasses1, y1はclasses5のように戻せる

array([ 1, 5, 0, 6, 4, ...

y_indices（yの各要素がどのクラスか）についてnp.bincountで計数

code:breakpoint.py

(Pdb) np.bincount(y_indices) # class 3と7が1サンプルだけ