Robust preference learning for storytelling via contrastive reinforcement learning

R Kirk, I Mediratta, C Nalmpantis, J Luketina… - arXiv preprint arXiv …, 2023 - arxiv.org

Large language models (LLMs) fine-tuned with reinforcement learning from human
feedback (RLHF) have been used in some of the most widely deployed AI models to date …

被引用次数：78 相关文章所有 4 个版本

[PDF] mdpi.com

Augmented behavioral annotation tools, with application to multimodal datasets and models: a systematic review

E Watson, T Viana, S Zhang - AI, 2023 - mdpi.com

Annotation tools are an essential component in the creation of datasets for machine learning
purposes. Annotation tools have evolved greatly since the turn of the century, and now …

被引用次数：10 相关文章所有 5 个版本

[PDF] neurips.cc

Preference-grounded token-level guidance for language model fine-tuning

S Yang, S Zhang, C Xia, Y Feng… - Advances in Neural …, 2024 - proceedings.neurips.cc

Aligning language models (LMs) with preferences is an important problem in natural
language generation. A key challenge is that preferences are typically provided at the …

被引用次数：11 相关文章所有 5 个版本

[PDF] arxiv.org

End-to-end Story Plot Generator

H Zhu, A Cohen, D Wang, K Yang, X Yang… - arXiv preprint arXiv …, 2023 - arxiv.org

Story plots, while short, carry most of the essential information of a full story that may contain
tens of thousands of words. We study the problem of automatic generation of story plots …

被引用次数：9 相关文章所有 3 个版本

[PDF] aclanthology.org

SWAG: Storytelling With Action Guidance

J Pei, Z Patel, K El-Refai, T Li - Findings of the Association for …, 2024 - aclanthology.org

Automated long-form story generation typically employs long-context large language models
(LLMs) for one-shot creation, which can produce cohesive but not necessarily engaging …

被引用次数：1 相关文章

[PDF] arxiv.org

Curiosity-Driven Reinforcement Learning from Human Feedback

H Sun, Y Chai, S Wang, Y Sun, H Wu… - arXiv preprint arXiv …, 2025 - arxiv.org

Reinforcement learning from human feedback (RLHF) has proven effective in aligning large
language models (LLMs) with human preferences, but often at the cost of reduced output …

[PDF] arxiv.org

GDPO: Learning to Directly Align Language Models with Diversity Using GFlowNets

OJ Kwon, DE Matsunaga, KE Kim - arXiv preprint arXiv:2410.15096, 2024 - arxiv.org

A critical component of the current generation of language models is preference alignment,
which aims to precisely control the model's behavior to meet human needs and values. The …

Computational Creativity in Media Production: At the Crossroad of Progress and Peril

D Keller - 2023 - bora.uib.no

This study focuses on an approach to generate suggested video stories from raw source
footage by using a multilayered hierarchical classification structure and narrative generation …

被引用次数：1 相关文章

高级搜索

QQ 群