Test time scaling
GPT-o1, o3
Prompt Engineering
DeepSeekを理解する
s1: Simple test-time scaling
https://arxiv.org/pdf/2501.19393
Scaling LLM Test-Time Compute Optimally can be More Effective than Scaling Model Parameters
https://arxiv.org/abs/2408.03314
Dyve: Thinking Fast and Slow for Dynamic Process Verification
https://arxiv.org/pdf/2502.11157
NaturalReasoning is a large-scale dataset for general reasoning tasks.
https://huggingface.co/datasets/facebook/natural_reasoning
S*: Test Time Scaling for Code Generation
https://arxiv.org/abs/2502.14382