作者
Aya Mudrik, Girish N Nadkarni, Orly Efros, Benjamin S Glicksberg, Eyal Klang, Shelly Soffer
发表日期
2024
来源
medRxiv
页码范围
2024.04. 26.24306358
出版商
Cold Spring Harbor Laboratory Press
简介
Rationale and Objectives
Large Language Models (LLMs) have the potential to enhance medical training, education, and diagnosis. However, since these models were not originally designed for medical purposes, there are concerns regarding their reliability and safety in clinical settings. This review systematically assesses the utility, advantages, and potential risks of employing LLMs in the field of hematology.
Materials and Methods
We searched PubMed, Web of Science, and Scopus databases for original publications on LLMs application in hematology. We limited the search to articles published in English from December 01 2022 to March 25, 2024, coinciding with the introduction of ChatGPT. To evaluate the risk of bias, we used the adapted version of the Quality Assessment of Diagnostic Accuracy Studies criteria (QUADAS-2).
Results
Eleven studies fulfilled the eligibility criteria. The studies varied in their goals and methods, covering medical education, diagnosis, and clinical practice. GPT-3.5 and GPT-4's demonstrated superior performance in diagnostic tasks and medical information propagation compared to other models like Google's Bard (currently called Gemini). GPT-4 demonstrated particularly high accuracy in tasks such as interpreting hematology cases and diagnosing hemoglobinopathy, with performance metrics of 76% diagnostic accuracy and 88% accuracy in identifying normal blood cells. However, the study also revealed discrepancies in model consistency and the accuracy of provided references, indicating variability in their reliability.
Conclusion
While LLMs present significant opportunities for advancing clinical …