Y Yang, Y Ma, P Liu - arXiv preprint arXiv:2407.13647, 2024 - arxiv.org
When large language models (LLMs) exceed human-level capabilities, it becomes increasingly challenging to provide full-scale and accurate supervision for these models …
L Tao, Y Li - arXiv preprint arXiv:2409.08813, 2024 - arxiv.org
The burgeoning capabilities of large language models (LLMs) have underscored the need for alignment to ensure these models act in accordance with human values and intentions …
Large language models (LLMs) can solve arithmetic word problems with high accuracy, but little is known about how well they generalize to problems that are more complex than the …
How can we harness the collective capabilities of multiple Large Language Models (LLMs) to create an even more powerful model? This question forms the foundation of our research …
Understanding and mitigating the potential risks associated with foundation models (FMs) hinges on developing effective interpretability methods. Sparse Autoencoders (SAEs) have …
Large language models (LLMs) have demonstrated impressive capabilities, but still suffer from inconsistency issues (eg LLMs can react differently to disturbances like rephrasing or …
While generalization over tasks from easy to hard is crucial to profile language models (LLMs), the datasets with fine-grained difficulty annotations for each problem across a broad …
D Wu, S Choudhury - arXiv preprint arXiv:2501.00911, 2025 - arxiv.org
Aligning large language models (LLMs) to human preferences is challenging in domains where preference data is unavailable. We address the problem of learning reward models …
Fine-tuning Large Language Models (LLMs) incurs considerable training costs, driving the need for data-efficient training with optimised data ordering. Human-inspired strategies offer …