Language models are aligned to emulate the collective voice of many, resulting in outputs that align with no one in particular. Steering LLMs away from generic output is possible …
JP Gardner, S Durand, D Stoller… - Forty-first International …, 2023 - openreview.net
Music has a unique and complex structure which is challenging for both expert humans and existing AI systems to understand, and presents unique challenges relative to other forms of …
The ability to build and leverage world models is essential for a general-purpose AI agent. Testing such capabilities is hard, in part because the building blocks of world models are ill …
Keyword mnemonics are memorable explanations that link new terms to simpler keywords. Prior works generate mnemonics for students, but they do not guide models toward …
When answering questions, LLMs can convey not only an answer, but a level of confidence about the answer being correct. This includes explicit confidence markers (eg giving a …
C Malaviya, P Agrawal, K Ganchev… - arXiv preprint arXiv …, 2024 - arxiv.org
Experts in various fields routinely perform methodical writing tasks to plan, organize, and report their work. From a clinician writing a differential diagnosis for a patient, to a teacher …
J Cheng, Y Lu, X Gu, P Ke, X Liu, Y Dong… - arXiv preprint arXiv …, 2024 - arxiv.org
Although Large Language Models (LLMs) are becoming increasingly powerful, they still exhibit significant but subtle weaknesses, such as mistakes in instruction-following or coding …
Large language models (LLMs) have shown promising abilities as cost-effective and reference-free evaluators for assessing language generation quality. In particular, pairwise …
S Ghosh, T Srinivasan, S Swayamdipta - arXiv preprint arXiv:2407.01878, 2024 - arxiv.org
Human evaluation of generated language through pairwise preference judgments is pervasive. However, under common scenarios, such as when generations from a model pair …