A systematic study and comprehensive evaluation of ChatGPT on benchmark datasets

X Hou, Y Zhao, Y Liu, Z Yang, K Wang, L Li… - ACM Transactions on …, 2024 - dl.acm.org

Large Language Models (LLMs) have significantly impacted numerous domains, including
Software Engineering (SE). Many recent publications have explored LLMs applied to …

被引用次数：466 相关文章所有 8 个版本

[PDF] acm.org

A survey on evaluation of large language models

Y Chang, X Wang, J Wang, Y Wu, L Yang… - ACM Transactions on …, 2024 - dl.acm.org

Large language models (LLMs) are gaining increasing popularity in both academia and
industry, owing to their unprecedented performance in various applications. As LLMs …

被引用次数：2011 相关文章所有 4 个版本

[PDF] arxiv.org

Is ChatGPT a good sentiment analyzer? A preliminary study

Z Wang, Q Xie, Y Feng, Z Ding, Z Yang… - arXiv preprint arXiv …, 2023 - arxiv.org

Recently, ChatGPT has drawn great attention from both the research community and the
public. We are particularly interested in whether it can serve as a universal sentiment …

被引用次数：202 相关文章所有 2 个版本

[PDF] arxiv.org

A multitask, multilingual, multimodal evaluation of chatgpt on reasoning, hallucination, and interactivity

Y Bang, S Cahyawijaya, N Lee, W Dai, D Su… - arXiv preprint arXiv …, 2023 - arxiv.org

This paper proposes a framework for quantitatively evaluating interactive LLMs such as
ChatGPT using publicly available data sets. We carry out an extensive technical evaluation …

被引用次数：1449 相关文章所有 5 个版本

[PDF] arxiv.org

Chatgpt is a knowledgeable but inexperienced solver: An investigation of commonsense problem in large language models

N Bian, X Han, L Sun, H Lin, Y Lu, B He, S Jiang… - arXiv preprint arXiv …, 2023 - arxiv.org

Large language models (LLMs) have made significant progress in NLP. However, their
ability to memorize, represent, and leverage commonsense knowledge has been a well …

被引用次数：120 相关文章所有 3 个版本

[PDF] arxiv.org

GPTEval: A survey on assessments of ChatGPT and GPT-4

R Mao, G Chen, X Zhang, F Guerin… - arXiv preprint arXiv …, 2023 - arxiv.org

The emergence of ChatGPT has generated much speculation in the press about its potential
to disrupt social and economic systems. Its astonishing language ability has aroused strong …

被引用次数：95 相关文章所有 4 个版本

[PDF] arxiv.org

Deep transfer learning for automatic speech recognition: Towards better generalization

H Kheddar, Y Himeur, S Al-Maadeed, A Amira… - Knowledge-Based …, 2023 - Elsevier

Automatic speech recognition (ASR) has recently become an important challenge when
using deep learning (DL). It requires large-scale training datasets and high computational …

被引用次数：81 相关文章所有 5 个版本

[PDF] arxiv.org

Large language models for cyber security: A systematic literature review

HX Xu, SA Wang, N Li, K Wang, Y Zhao, K Chen… - arXiv preprint arXiv …, 2024 - arxiv.org

The rapid advancement of Large Language Models (LLMs) has opened up new
opportunities for leveraging artificial intelligence in various domains, including cybersecurity …

被引用次数：38 相关文章所有 2 个版本

[PDF] arxiv.org

Flask: Fine-grained language model evaluation based on alignment skill sets

S Ye, D Kim, S Kim, H Hwang, S Kim, Y Jo… - arXiv preprint arXiv …, 2023 - arxiv.org

Evaluation of Large Language Models (LLMs) is challenging because aligning to human
values requires the composition of multiple skills and the required set of skills varies …

被引用次数：66 相关文章所有 4 个版本

[HTML] sciencedirect.com

[HTML][HTML] A comprehensive evaluation of large language models on benchmark biomedical text processing tasks

I Jahan, MTR Laskar, C Peng, JX Huang - Computers in biology and …, 2024 - Elsevier

Abstract Recently, Large Language Models (LLMs) have demonstrated impressive
capability to solve a wide range of tasks. However, despite their success across various …

被引用次数：47 相关文章所有 5 个版本

高级搜索

QQ 群