Language Model Council: Benchmarking Foundation Models on Highly Subjective Tasks by Consensus

J Zhao, FM Plaza-del-Arco, AC Curry - arXiv preprint arXiv:2406.08598, 2024 - arxiv.org
The rapid advancement of Large Language Models (LLMs) necessitates robust and
challenging benchmarks. Leaderboards like Chatbot Arena rank LLMs based on how well …