RouterBench: A Benchmark for Multi-LLM Routing System
https://arxiv.org/abs/2403.12031
we present RouterBench, a novel evaluation framework designed to systematically assess the efficacy of LLM routing systems, along with a comprehensive dataset comprising over 405k inference outcomes from representative LLMs to support the development of routing strategies.
https://github.com/withmartian/routerbench
https://huggingface.co/datasets/withmartian/routerbench
with the prompts taken from standard benchmarks such as MBPP, GSM-8k, Winogrande, Hellaswag, MMLU, MT-Bench, and more.