A framework for human evaluation of large language models in healthcare derived from literature review

TYC Tam, S Sivarajkumar, S Kapoor, AV Stolyar… - NPJ Digital …, 2024 - nature.com
With generative artificial intelligence (GenAI), particularly large language models (LLMs),
continuing to make inroads in healthcare, assessing LLMs with human evaluations is …

Current applications and challenges in large language models for patient care: a systematic review

F Busch, L Hoffmann, C Rueger, EHC van Dijk… - Communications …, 2025 - nature.com
Background The introduction of large language models (LLMs) into clinical practice
promises to improve patient education and empowerment, thereby personalizing medical …

Disparities in medical recommendations from AI-based chatbots across different countries/regions

KE Gumilar, BR Indraprasta, YC Hsu, ZY Yu, H Chen… - Scientific reports, 2024 - nature.com
This study explores disparities and opportunities in healthcare information provided by AI
chatbots. We focused on recommendations for adjuvant therapy in endometrial cancer …

Higher education students' perceptions of ChatGPT: A global study of early reactions

D Ravšelj, D Keržič, N Tomaževič, L Umek, N Brezovar… - PloS one, 2025 - journals.plos.org
The paper presents the most comprehensive and large-scale global study to date on how
higher education students perceived the use of ChatGPT in early 2024. With a sample of …

Large language models in patient education: a scoping review of applications in medicine

S Aydin, M Karabacak, V Vlachos, K Margetis - Frontiers in Medicine, 2024 - frontiersin.org
Introduction Large Language Models (LLMs) are sophisticated algorithms that analyze and
generate vast amounts of textual data, mimicking human communication. Notable LLMs …

[HTML][HTML] Performance of large language models in patient complaint resolution: web-based cross-sectional survey

LPX Yong, JYM Tung, ZY Lee, WS Kuan… - Journal of Medical Internet …, 2024 - jmir.org
Background Patient complaints are a perennial challenge faced by health care institutions
globally, requiring extensive time and effort from health care workers. Despite these efforts …

[HTML][HTML] ChatGPT in radiology: A systematic review of performance, pitfalls, and future perspectives

P Keshavarz, S Bagherieh, SA Nabipoorashrafi… - Diagnostic and …, 2024 - Elsevier
Purpose The purpose of this study was to systematically review the reported performances
of ChatGPT, identify potential limitations, and explore future directions for its integration …

Augmenting large language models with rules for enhanced domain-specific interactions: The case of medical diagnosis

DP Panagoulias, M Virvou, GA Tsihrintzis - Electronics, 2024 - mdpi.com
In this paper, we present a novel Artificial Intelligence (AI)-empowered system that enhances
large language models and other machine learning tools with rules to provide primary care …

Accuracy assessment of ChatGPT responses to frequently asked questions regarding anterior cruciate ligament surgery

JB Villarreal-Espinosa, RS Berreta, F Allende… - The Knee, 2024 - Elsevier
Background The emergence of artificial intelligence (AI) has allowed users to have access to
large sources of information in a chat-like manner. Thereby, we sought to evaluate ChatGPT …

[HTML][HTML] Assessment of Large Language Models (LLMs) in decision-making support for gynecologic oncology

KE Gumilar, BR Indraprasta, AS Faridzi… - Computational and …, 2024 - Elsevier
Objective This study investigated the ability of Large Language Models (LLMs) to provide
accurate and consistent answers by focusing on their performance in complex gynecologic …