itakura-2024-09-20 進捗報告

やったこと

・SegFormerに関する実験

SegFormerのコードをGithubから取ってきて改めて実験をおこなったところ、Hugging Faceを使った時よりも精度が上がった。

そのコードを使い、SphericalPEを適用したときの精度を10epochで確認した。

純粋なSegFormerの精度が一番よい。

どうすれば移動同変性が向上する？

table: 定量評価

Methods meanE_in↓ disR_in↓ meanE_ex↓ disR_ex↓ mIoU↑ mIoU_weighted↑

Zero 0.1950 0.3119 0.1930 0.3102 0.4573 0.7449

PP-Pad(2x3 Conv)(向井) 0.1840 0.2975 0.1821 0.2957 0.4717 0.8067

PP-Pad(2x3 Conv)(葉) 0.1616 0.2763 0.1600 0.2747 0.4923 0.8276

SegFormer 0.1169 0.2518 0.1150 0.2498 0.5685 0.9092

HuggingFace 0.6391 0.7633 0.6368 0.7627 0.2157 0.3778

SegFormer(Mix→spe) 0.2606 0.4093 0.2609 0.4098 0.1489 0.3146

SegFormer(SPEのみ) 0.4789 0.6554 0.4802 0.6568 0.0828 0.2963

SegFomrer(SPE+Mix) 0.3288 0.5089 0.3264 0.5076 0.4435 0.8474

SegFomrer(学習可能なSPE) 0.3516 0.5349 0.3520 0.5355 0.1297 0.3209

PP-Pad(2x3 Conv)(20epo) 0.181 0.2968 0.1794 0.2952 0.4695 0.7817

SegFormer(20epo) 0.1095 0.2254 0.1075 0.2232 0.5797 0.9127

HuggingFace(20epo) 0.2128 0.3493 0.2110 0.3478 0.5053 0.8427

・PSPNetのPaddingをTransformerを使って推定

PPpadと同様に、2列分のピクセルを取り出して、Transformerネットワークに通す。

https://gyazo.com/027d0ae6f4077ffb12882fc5fb61c5db

ピクセルの埋め込み次元とTransformerブロック数を変えて、10epochで実験を行ってみた。

emb：埋め込み次元、depth：ブロック数

table: 定量評価

Methods meanE_in↓ disR_in↓ meanE_ex↓ disR_ex↓ mIoU↑ mIoU_weighted↑

Zero 0.2038 0.3752 0.2015 0.3733 0.4748 0.8583

PP-Pad(2x3 Conv)(向井) 0.1840 0.2975 0.1821 0.2957 0.4717 0.8067

PP-Pad(2x3 Conv)(葉) 0.1616 0.2763 0.1600 0.2747 0.4923 0.8276

emb=256,depth=1 0.1752 0.3049 0.1733 0.3033 0.4692 0.7670

emb=128,depth=2 0.1837 0.2962 0.1820 0.2945 0.4787 0.7903

emb=64, depth=3 0.1776 0.2829 0.1758 0.2811 0.4820 0.7951

emb=32, depth=4 0.1702 0.2938 0.1680 0.2922 0.4851 0.7902

emb=32, depth=4(30epo) 0.1936 0.3079 0.1920 0.3062 0.4502 0.7182

0.1768 0.2848 0.1745 0.2828 0.4911 0.8254

SegFormer 0.1169 0.2518 0.1150 0.2498 0.5685 0.9092

今後の予定

SegFormerの移動同変性を向上させるもしくは、Transformerを用いてPaddingを推定する手法の精度を上げる