Compression, transduction, and creation: A unified framework for evaluating natural language...

PP Liang, A Zadeh, LP Morency - ACM Computing Surveys, 2024 - dl.acm.org

Multimodal machine learning is a vibrant multi-disciplinary research field that aims to design
computer agents with intelligent capabilities such as understanding, reasoning, and learning …

被引用次数：75 相关文章

[PDF] arxiv.org

Foundations and Trends in Multimodal Machine Learning: Principles, Challenges, and Open Questions

PP Liang, A Zadeh, LP Morency - arXiv preprint arXiv:2209.03430, 2022 - arxiv.org

Multimodal machine learning is a vibrant multi-disciplinary research field that aims to design
computer agents with intelligent capabilities such as understanding, reasoning, and learning …

被引用次数：151 相关文章所有 2 个版本

[PDF] arxiv.org

Rlprompt: Optimizing discrete text prompts with reinforcement learning

M Deng, J Wang, CP Hsieh, Y Wang, H Guo… - arXiv preprint arXiv …, 2022 - arxiv.org

Prompting has shown impressive success in enabling large pretrained language models
(LMs) to perform diverse NLP tasks, especially when only few downstream data are …

被引用次数：321 相关文章所有 9 个版本

[PDF] arxiv.org

Towards a unified multi-dimensional evaluator for text generation

M Zhong, Y Liu, D Yin, Y Mao, Y Jiao, P Liu… - arXiv preprint arXiv …, 2022 - arxiv.org

Multi-dimensional evaluation is the dominant paradigm for human evaluation in Natural
Language Generation (NLG), ie, evaluating the generated text from multiple explainable …

被引用次数：218 相关文章所有 6 个版本

[PDF] jair.org Full View

Repairing the cracked foundation: A survey of obstacles in evaluation practices for generated text

S Gehrmann, E Clark, T Sellam - Journal of Artificial Intelligence Research, 2023 - jair.org

Abstract Evaluation practices in natural language generation (NLG) have many known flaws,
but improved evaluation approaches are rarely widely adopted. This issue has become …

被引用次数：158 相关文章所有 6 个版本

[PDF] aaai.org

Medalign: A clinician-generated dataset for instruction following with electronic medical records

SL Fleming, A Lozano, WJ Haberkorn… - Proceedings of the …, 2024 - ojs.aaai.org

The ability of large language models (LLMs) to follow natural language instructions with
human-level fluency suggests many opportunities in healthcare to reduce administrative …

被引用次数：60 相关文章所有 3 个版本

[PDF] arxiv.org

AlignScore: Evaluating factual consistency with a unified alignment function

Y Zha, Y Yang, R Li, Z Hu - arXiv preprint arXiv:2305.16739, 2023 - arxiv.org

Many text generation applications require the generated text to be factually consistent with
input information. Automatic evaluation of factual consistency is challenging. Previous work …

被引用次数：122 相关文章所有 4 个版本

[PDF] arxiv.org

Generative knowledge graph construction: A review

H Ye, N Zhang, H Chen, H Chen - arXiv preprint arXiv:2210.12714, 2022 - arxiv.org

Generative Knowledge Graph Construction (KGC) refers to those methods that leverage the
sequence-to-sequence framework for building knowledge graphs, which is flexible and can …

被引用次数：76 相关文章所有 4 个版本

[PDF] neurips.cc

Text alignment is an efficient unified model for massive nlp tasks

Y Zha, Y Yang, R Li, Z Hu - Advances in Neural Information …, 2023 - proceedings.neurips.cc

Large language models (LLMs), typically designed as a function of next-word prediction,
have excelled across extensive NLP tasks. Despite the generality, next-word prediction is …

被引用次数：7 相关文章所有 5 个版本

[PDF] arxiv.org

Receval: Evaluating reasoning chains via correctness and informativeness

A Prasad, S Saha, X Zhou, M Bansal - arXiv preprint arXiv:2304.10703, 2023 - arxiv.org

Multi-step reasoning ability is fundamental to many natural language tasks, yet it is unclear
what constitutes a good reasoning chain and how to evaluate them. Most existing methods …

被引用次数：31 相关文章所有 4 个版本

高级搜索

QQ 群