RouterBench: A Benchmark for Multi-LLM Routing System

https://arxiv.org/abs/2403.12031

we present RouterBench, a novel evaluation framework designed to systematically assess the efficacy of LLM routing systems, along with a comprehensive dataset comprising over 405k inference outcomes from representative LLMs to support the development of routing strategies.

https://github.com/withmartian/routerbench

https://huggingface.co/datasets/withmartian/routerbench

with the prompts taken from standard benchmarks such as MBPP, GSM-8k, Winogrande, Hellaswag, MMLU, MT-Bench, and more.