MMLU - work4ai

MMLU

https://arxiv.org/abs/2009.03300Measuring Massive Multitask Language Understanding

#LLMベンチマーク

Claude 3.5 Sonnetが 90.4%でGPT-4を越えているwogikaze.icon

現状トップはGPT-4

Steering at the Frontier: Extending the Power of Prompting - Microsoft Research

https://gyazo.com/dd86c024e89ef8bace3364d07ffb8d44

https://www.youtube.com/watch?v=hVade_8H8mE

SmartGPT: Major Benchmark Broken - 89.0% on MMLU + Exam's Many Errors