sklearn.model_selection.StratifiedShuffleSplit
https://github.com/scikit-learn/scikit-learn/blob/1.1.2/sklearn/model_selection/_split.py#L1877
Stratified ShuffleSplit cross-validator
Provides train/test indices to split data in train/test sets.
https://scikit-learn.org/stable/modules/cross_validation.html#stratified-shuffle-split
ベースクラスは sklearn.model_selection._split.BaseShuffleSplit
_iter_indicesメソッド
https://github.com/scikit-learn/scikit-learn/blob/1.1.2/sklearn/model_selection/_split.py#L1948-L2014
y.ndim == 2のとき、マルチラベルを表す文字列に変換
for multi-label y, map each distinct row to a string repr
https://github.com/scikit-learn/scikit-learn/blob/1.1.2/sklearn/model_selection/_split.py#L1961
例:マルチラベル[0, 0, 0, 0, 1]は"0 0 0 0 1"になる
マルチラベルの時、yの1つ1つの要素は文字列に変換される
その状態でnp.uniqueを取る
classes, y_indices = np.unique(y, return_inverse=True)
https://github.com/scikit-learn/scikit-learn/blob/1.1.2/sklearn/model_selection/_split.py#L1963
code:breakpoint.py
(Pdb) classes # distinctの状態になったyの要素
array(['0 0 0 0 0', '0 0 0 0 1', '0 0 0 1 0', '0 0 0 1 1', '0 1 0 0 0',
'0 1 0 0 1', '0 1 0 1 0', '0 1 0 1 1', '1 0 0 0 0', '1 0 0 0 1',
'1 0 0 1 0', '1 1 0 0 0', '1 1 0 0 1', '1 1 0 1 0', '1 1 0 1 1',
'1 1 1 1 1'], dtype='<U9')
(Pdb) y_indices # y0はclasses1, y1はclasses5のように戻せる
array([ 1, 5, 0, 6, 4, ...
y_indices(yの各要素がどのクラスか)についてnp.bincountで計数
https://github.com/scikit-learn/scikit-learn/blob/1.1.2/sklearn/model_selection/_split.py#L1966
code:breakpoint.py
(Pdb) np.bincount(y_indices) # class 3と7が1サンプルだけ
array(12, 2, 4, 1, 9, 5, 12, 1, 11, 4, 5, 11, 4, 10, 4, 5)