音響イベント検出におけるメディアンフィルタ

adaptive median filterの発想元:

CONVOLUTION-AUGMENTED TRANSFORMER FOR SEMI-SUPERVISED SOUND EVENT DETECTION

2.5節 Post-processingより引用:

To determine the sound event activation, we perform thresholding for the network output posterior. Then, we perform median filtering as post-processing to smooth the detected activation sequence.

Since each sound event has different characteristics, such as temporal structures, the optimal post-processing parameters depend on the individual sound events. Hence, we determine the optimal postprocessing parameters for each sound event using the validation set.

We search the optimal threshold and median filter size from 0.1 to 0.9 in increments of 0.1, and from 1 to 31 in increments of 2, respectively.

個別のイベントごとに時間が異なるので,メディアンフィルタの窓について,ハイパーパラメータの調整が必要だが,それは検証用のデータを使って行った,ということか?

FMSG-JLESS SUBMISSION FOR DCASE 2024 TASK4 ON SOUND EVENT DETECTION WITH HETEROGENEOUS TRAINING DATASET AND POTENTIALLY MISSING LABELSではFMSG SUBMISSION FOR DCASE 2023 CHALLENGE TASK 4 ON SOUND EVENT DETECTION WITH WEAK LABELS AND SYNTHETIC SOUNDSCAPESを読んでねと書いてある

後者より引用:

In all experiments, we employed adaptive median filtering (MF) technique. This approach involved the application of median filters with varying window sizes, denoted as $ W in, based on the duration of real-life event categories $ c. The specific window sizes for each event category are presented below:

code: tex

W inc = durationc × βc (2)

In order to handle event categories with significant duration variation, we employed a dynamic approach by setting the median duration $ duration_c as the reference. For this purpose, we initially set the parameter $ βc= 1/3 and fine-tuned the window sizes based on the development set, ensuring optimal performance.

推論結果を統合するためにmedian filterを使用し,その窓はイベントの持つ時間などで決まる

今回のモデルだと,評価時に後処理として使用

それぞれのモデルの推論後

評価結果について,閾値とmedlian filterで後処理

この後処理を丸々置き換えるのがSBSS?