R1-V - work4ai

R1-V

RLVR強化学習をQwen2-VL-2Bに適応したらQwen2-VL-72Bを越えたよ論文

R1-V

Reinforcing Super Generalization Ability in Vision Langauge Models with Less Than $3

The 2B model outperforms the 72B model in OOD tests within just 100 training steps.

https://gyazo.com/c3a5ea2c37bd1d36f8adba987a1fbd0b

2Bで！？morisoba65536.icon

と思ったが、https://x.com/liangchen5518/status/1886171667522842856を見るに特にReasoning Modelとしてガッツリ調整してるわけではなく数のカウントなどに特化した学習をしたようだ