DeepScaleR
https://github.com/agentica-project/deepscaleragentica-project/deepscaler
https://pretty-radio-b75.notion.site/DeepScaleR-Surpassing-O1-Preview-with-a-1-5B-Model-by-Scaling-RL-19681902c1468005bed8ca303013a4e2
https://huggingface.co/agentica-org/DeepScaleR-1.5B-Previewagentica-org/DeepScaleR-1.5B-Preview
https://gyazo.com/9392eb7fc8264a7c4057fc24074a9411
#Agenda
蒸留+ GRPOで数学力だけなら1.5BがOpenAI o1-preview超えたのか…morisoba65536.icon
ただ、Xでの報告見る限りほんとに数学特化(数学以外何もできない)のようだ(サイズ考えると致し方ないところはあるが)morisoba65536.icon