Factscore: Fine-grained atomic evaluation of factual precision in long form text generation

S Min, K Krishna, X Lyu, M Lewis, W Yih… - arXiv preprint arXiv …, 2023 - arxiv.org
Evaluating the factuality of long-form text generated by large language models (LMs) is non-
trivial because (1) generations often contain a mixture of supported and unsupported pieces …

A long way to go: Investigating length correlations in rlhf

P Singhal, T Goyal, J Xu, G Durrett - arXiv preprint arXiv:2310.03716, 2023 - arxiv.org
Great successes have been reported using Reinforcement Learning from Human Feedback
(RLHF) to align large language models. Open-source preference datasets and reward …

Group preference optimization: Few-shot alignment of large language models

S Zhao, J Dang, A Grover - arXiv preprint arXiv:2310.11523, 2023 - arxiv.org
Many applications of large language models (LLMs), ranging from chatbots to creative
writing, require nuanced subjective judgments that can differ significantly across different …

Beyond reverse kl: Generalizing direct preference optimization with diverse divergence constraints

C Wang, Y Jiang, C Yang, H Liu, Y Chen - arXiv preprint arXiv:2309.16240, 2023 - arxiv.org
The increasing capabilities of large language models (LLMs) raise opportunities for artificial
general intelligence but concurrently amplify safety concerns, such as potential misuse of AI …

A survey on lora of large language models

Y Mao, Y Ge, Y Fan, W Xu, Y Mi, Z Hu… - arXiv preprint arXiv …, 2024 - arxiv.org
Low-Rank Adaptation~(LoRA), which updates the dense neural network layers with
pluggable low-rank matrices, is one of the best performed parameter efficient fine-tuning …

Prototypical Reward Network for Data-Efficient RLHF

J Zhang, X Wang, Y Jin, C Chen, X Zhang… - arXiv preprint arXiv …, 2024 - arxiv.org
The reward model for Reinforcement Learning from Human Feedback (RLHF) has proven
effective in fine-tuning Large Language Models (LLMs). Notably, collecting human feedback …

Recovering the Pre-Fine-Tuning Weights of Generative Models

E Horwitz, J Kahana, Y Hoshen - arXiv preprint arXiv:2402.10208, 2024 - arxiv.org
The dominant paradigm in generative modeling consists of two steps: i) pre-training on a
large-scale but unsafe dataset, ii) aligning the pre-trained model with human values via fine …

Decoding-time Realignment of Language Models

T Liu, S Guo, L Bianco, D Calandriello… - arXiv preprint arXiv …, 2024 - arxiv.org
Aligning language models with human preferences is crucial for reducing errors and biases
in these models. Alignment techniques, such as reinforcement learning from human …

Chatmap: Large Language Model Interaction with Cartographic Data

E Unlu - arXiv preprint arXiv:2310.01429, 2023 - arxiv.org
The swift advancement and widespread availability of foundational Large Language Models
(LLMs), complemented by robust fine-tuning methodologies, have catalyzed their adaptation …

Finding Safety Neurons in Large Language Models

J Chen, X Wang, Z Yao, Y Bai, L Hou, J Li - arXiv preprint arXiv …, 2024 - arxiv.org
Large language models (LLMs) excel in various capabilities but also pose safety risks such
as generating harmful content and misinformation, even after safety alignment. In this paper …