CMTの論文を丁寧に読む

Traditional mean teacher can be seriously affected by the in- accurate predictions of unlabeled data. Therefore, we propose the confident mean teacher (CMT) method to address the pseudo-label accuracy problem. The core idea of CMT is to correct inaccurate predictions from the teacher by post- processing operations and train the student with high confidence labels. The structure of CMT is shown in Fig 2.

これまでのMean Teacherでは教師モデルの出力の不安定さが課題だった

擬似ラベルの正確性を検討

教師モデルの出力に閾値を設定し,信頼できる出力だけをラベルとして使用

https://gyazo.com/bbbd0cbba80b17cfdb20940c21d4c45d

The structure of confident mean teacher.$ \~y_w and $ \~y_s denote the clip-wise and frame-wise prediction of teacher. $ f_{\theta ^s} (x)_w and $ f_{\theta ^s} (x)_s denote the prediction of student. $ \hat{y}_w and $ \hat{y}_s denote the corrected pseudo-labels. $ L_{w,con} and $ L_{s,con} denote the clip-wise and frame-wise consistency loss. $ c_w and $ c_s denote the confidence weight .

~や^は識別のため使用

Pseudo-Labelingは擬似ラベルの付与. pseudo labelは擬似ラベル

consystency lossは一貫性損失

confidence weightは信頼度重み

記号を整理

wがクリップ,sがフレーム

$ \~y_w,$ \~y_s

教師モデルの予測

0,1の間か,0,1のどちらかか

何のこと? 確率か二値か?

$ f_{\theta ^s} (x)_w , $ f_{\theta ^s} (x)_s

生徒モデルのクリップ/フレーム予測

$ \hat{y}_w, $ \hat{y}_s

理想の疑似ラベル?

$ L_{w,con} , $ L_{s,con}

一貫性損失

$ c_w , $ c_s

信頼度重み

$ T

フレームの数

$ K

イベントのクラス数

$ \phi_{clip}, $ \phi_{frame}

クリップとフレーム,それぞれの予測に対する閾値

数学記号

Iは指示関数,要はif

条件を満たすと1,満たさないと0を返す

cf. https://mathlandscape.com/indicator-function/

[]

閉区間

下の例だと,0から1の実数のk次元ベクトルということ?

cf. https://w3e.kanazawa-it.ac.jp/math/category/other/syuugou/henkan-tex.cgi?target=/math/category/other/syuugou/kukann.html&list=1

In particular, we first obtain the clip-wise prediction$ \hat{y}_w \in [0,1]^K and frame-wise prediction $ \hat{y}_s \in [0,1]^{T\times K} from the teacher model. $ T and $ K denote the frame number and sound event class number. Then we set a clip-wise threshold $ \phi_{clip}. If $ \hat{y}_w > \phi_{clip}, \hat{y}_w is assigned to 1. Otherwise,$ \hat{y}_w is assigned to 0. If $ \hat{y}_s > \phi_{clip}, \hat{y}_s is assigned to 0. In addition to weak threshold, we also set the frame-wise threshold $ \phi_{frame}. If $ \hat{y}_s > \phi_{frame}, \hat{y}_s is assigned to 1. Otherwise, $ \hat{y}_s is assigned to 0. After strong threshold, we smooth the frame-wise pre- diction$ \hat{y}_s with event-specific median filters. These steps can be denoted as follows:

code: latex

\~y_w(k) = I(\hat{y}_{w}(k) > \phi_{clip}) \tag{3}

code: latex

\~y_s(t,k) = MF(I(\hat{y}_{w}(k) > \phi_{clip}) I(\hat{y}_{w}(k) > \phi_{clip})) \tag{4}

閾値を超えると1,越えないと0

(4)式のMFはmedlian filter

システムの全体を見てきた > /research-custard/卒論のシステムの流れを把握する

提案はMFの前に閾値(確信度)を設定し,教師モデルの予測を絞る点

ここのMFは単なるノイズ処理

システムの後処理とは何の関連もない