R1-V
R1-V
Reinforcing Super Generalization Ability in Vision Langauge Models with Less Than $3
The 2B model outperforms the 72B model in OOD tests within just 100 training steps.
https://gyazo.com/c3a5ea2c37bd1d36f8adba987a1fbd0b
2Bで!?morisoba65536.icon