GRPO - yuyan

GRPO

DeepSeekを理解する

DeepSeekを理解する

DeepSeekでも使われるGRPOをtrlで試す

https://zenn.dev/ksterx/articles/0b0e707e5329e9

trl

https://github.com/huggingface/trl

Train your own R1 reasoning model with Unsloth (GRPO)

https://unsloth.ai/blog/r1-reasoning

Unsloth で独自の R1 Reasoningモデルを学習

https://note.com/npaka/n/nd99a395b404f

強化学習「GRPO」をCartPoleタスクで実装しながら解説

https://zenn.dev/mkj/articles/10dfe35cd32026

Long-context GRPO

https://unsloth.ai/blog/grpo

LLMにおける強化学習の基礎

https://zenn.dev/questlico/articles/4067769a826160

AI Mathematical Olympiad - Progress Prize 2

https://www.kaggle.com/competitions/ai-mathematical-olympiad-progress-prize-2