DeepScaleR
https://github.com/agentica-project/deepscaler
agentica-project
/
deepscaler
https://pretty-radio-b75.notion.site/DeepScaleR-Surpassing-O1-Preview-with-a-1-5B-Model-by-Scaling-RL-19681902c1468005bed8ca303013a4e2
https://huggingface.co/agentica-org/DeepScaleR-1.5B-Preview
agentica-org
/
DeepScaleR-1.5B-Preview
https://gyazo.com/9392eb7fc8264a7c4057fc24074a9411
#Agenda
蒸留
+
GRPO
で数学力だけなら1.5Bが
OpenAI o1-preview
超えたのか…
morisoba65536.icon
ただ、Xでの報告見る限りほんとに数学特化(数学以外何もできない)のようだ(サイズ考えると致し方ないところはあるが)
morisoba65536.icon