Flask: Fine-grained language model evaluation based on alignment skill sets

S Min, K Krishna, X Lyu, M Lewis, W Yih… - arXiv preprint arXiv …, 2023 - arxiv.org

Evaluating the factuality of long-form text generated by large language models (LMs) is non-
trivial because (1) generations often contain a mixture of supported and unsupported pieces …

被引用次数：439 相关文章所有 8 个版本

[PDF] arxiv.org

Trustllm: Trustworthiness in large language models

Y Huang, L Sun, H Wang, S Wu, Q Zhang, Y Li… - arXiv preprint arXiv …, 2024 - arxiv.org

Large language models (LLMs), exemplified by ChatGPT, have gained considerable
attention for their excellent natural language processing capabilities. Nonetheless, these …

被引用次数：232 相关文章所有 4 个版本

[PDF] arxiv.org

Aligning large language models with human: A survey

Y Wang, W Zhong, L Li, F Mi, X Zeng, W Huang… - arXiv preprint arXiv …, 2023 - arxiv.org

Large Language Models (LLMs) trained on extensive textual corpora have emerged as
leading solutions for a broad array of Natural Language Processing (NLP) tasks. Despite …

被引用次数：274 相关文章所有 2 个版本

[HTML] mlr.press

[HTML][HTML] Position: TrustLLM: Trustworthiness in large language models

Y Huang, L Sun, H Wang, S Wu… - International …, 2024 - proceedings.mlr.press

Large language models (LLMs) have gained considerable attention for their excellent
natural language processing capabilities. Nonetheless, these LLMs present many …

被引用次数：32 相关文章

[PDF] arxiv.org

Llm-based nlg evaluation: Current status and challenges

M Gao, X Hu, J Ruan, X Pu, X Wan - arXiv preprint arXiv:2402.01383, 2024 - arxiv.org

Evaluating natural language generation (NLG) is a vital but challenging problem in artificial
intelligence. Traditional evaluation metrics mainly capturing content (eg n-gram) overlap …

被引用次数：63 相关文章所有 2 个版本

[PDF] openreview.net

The unlocking spell on base llms: Rethinking alignment via in-context learning

BY Lin, A Ravichander, X Lu, N Dziri… - The Twelfth …, 2023 - openreview.net

Alignment tuning has become the de facto standard practice for enabling base large
language models (LLMs) to serve as open-domain AI assistants. The alignment tuning …

被引用次数：108 相关文章所有 3 个版本

[PDF] arxiv.org

From Generation to Judgment: Opportunities and Challenges of LLM-as-a-judge

D Li, B Jiang, L Huang, A Beigi, C Zhao, Z Tan… - arXiv preprint arXiv …, 2024 - arxiv.org

Assessment and evaluation have long been critical challenges in artificial intelligence (AI)
and natural language processing (NLP). However, traditional methods, whether matching …

被引用次数：7 相关文章所有 3 个版本

[PDF] openreview.net

Prometheus: Inducing fine-grained evaluation capability in language models

S Kim, J Shin, Y Cho, J Jang, S Longpre… - The Twelfth …, 2023 - openreview.net

Recently, GPT-4 has become the de facto evaluator for long-form text generated by large
language models (LLMs). However, for practitioners and researchers with large and custom …

被引用次数：127 相关文章所有 3 个版本

[PDF] arxiv.org

Large language model alignment: A survey

T Shen, R Jin, Y Huang, C Liu, W Dong, Z Guo… - arXiv preprint arXiv …, 2023 - arxiv.org

Recent years have witnessed remarkable progress made in large language models (LLMs).
Such advancements, while garnering significant attention, have concurrently elicited various …

被引用次数：133 相关文章所有 2 个版本

[PDF] arxiv.org

Self-taught evaluators

T Wang, I Kulikov, O Golovneva, P Yu, W Yuan… - arXiv preprint arXiv …, 2024 - arxiv.org

Model-based evaluation is at the heart of successful model development--as a reward
model for training, and as a replacement for human evaluation. To train such evaluators, the …

被引用次数：28 相关文章所有 3 个版本

高级搜索

QQ 群