Understanding the effects of rlhf on llm generalisation and diversity

R Kirk, I Mediratta, C Nalmpantis, J Luketina… - arXiv preprint arXiv …, 2023 - arxiv.org
Large language models (LLMs) fine-tuned with reinforcement learning from human
feedback (RLHF) have been used in some of the most widely deployed AI models to date …

Augmented behavioral annotation tools, with application to multimodal datasets and models: a systematic review

E Watson, T Viana, S Zhang - AI, 2023 - mdpi.com
Annotation tools are an essential component in the creation of datasets for machine learning
purposes. Annotation tools have evolved greatly since the turn of the century, and now …

Preference-grounded token-level guidance for language model fine-tuning

S Yang, S Zhang, C Xia, Y Feng… - Advances in Neural …, 2024 - proceedings.neurips.cc
Aligning language models (LMs) with preferences is an important problem in natural
language generation. A key challenge is that preferences are typically provided at the …

End-to-end Story Plot Generator

H Zhu, A Cohen, D Wang, K Yang, X Yang… - arXiv preprint arXiv …, 2023 - arxiv.org
Story plots, while short, carry most of the essential information of a full story that may contain
tens of thousands of words. We study the problem of automatic generation of story plots …

SWAG: Storytelling With Action Guidance

J Pei, Z Patel, K El-Refai, T Li - Findings of the Association for …, 2024 - aclanthology.org
Automated long-form story generation typically employs long-context large language models
(LLMs) for one-shot creation, which can produce cohesive but not necessarily engaging …

Curiosity-Driven Reinforcement Learning from Human Feedback

H Sun, Y Chai, S Wang, Y Sun, H Wu… - arXiv preprint arXiv …, 2025 - arxiv.org
Reinforcement learning from human feedback (RLHF) has proven effective in aligning large
language models (LLMs) with human preferences, but often at the cost of reduced output …

GDPO: Learning to Directly Align Language Models with Diversity Using GFlowNets

OJ Kwon, DE Matsunaga, KE Kim - arXiv preprint arXiv:2410.15096, 2024 - arxiv.org
A critical component of the current generation of language models is preference alignment,
which aims to precisely control the model's behavior to meet human needs and values. The …

Computational Creativity in Media Production: At the Crossroad of Progress and Peril

D Keller - 2023 - bora.uib.no
This study focuses on an approach to generate suggested video stories from raw source
footage by using a multilayered hierarchical classification structure and narrative generation …