lm-evaluation-harness - work4ai

lm-evaluation-harness

https://github.com/EleutherAI/lm-evaluation-harnesseleutherAI

https://github.com/Stability-AI/lm-evaluation-harnessmaster

自己回帰言語モデルの少数ショット評価のためのフレームワーク

このプロジェクトは、生成言語モデルを多数の異なる評価タスクでテストするための統一されたフレームワークを提供します。

https://github.com/Stability-AI/lm-evaluation-harness/tree/jp-stable?s=09日本版

https://gyazo.com/dc8aca7fe138dcc2c8e5d6ece42595b1

japanese-gpt-neox-3.6b-instruction-sftが1位

LLMベンチマーク