Predicting the sample size of randomized controlled trials using natural language processing- 学术资源搜索

Predicting the sample size of randomized controlled trials using natural language processing

P Windisch, F Dennstädt, C Koechli, R Förster… - JAMIA …, 2024 - academic.oup.com

P Windisch, F Dennstädt, C Koechli, R Förster, C Schröder, DM Aebersold, DR Zwahlen

JAMIA open, 2024•academic.oup.com

Objectives Extracting the sample size from randomized controlled trials (RCTs) remains a
challenge to developing better search functionalities or automating systematic reviews. Most
current approaches rely on the sample size being explicitly mentioned in the abstract. The
objective of this study was, therefore, to develop and validate additional approaches.
Materials and Methods 847 RCTs from high-impact medical journals were tagged with 6
different entities that could indicate the sample size. A named entity recognition (NER) …

Objectives

Extracting the sample size from randomized controlled trials (RCTs) remains a challenge to developing better search functionalities or automating systematic reviews. Most current approaches rely on the sample size being explicitly mentioned in the abstract. The objective of this study was, therefore, to develop and validate additional approaches.

Materials and Methods

847 RCTs from high-impact medical journals were tagged with 6 different entities that could indicate the sample size. A named entity recognition (NER) model was trained to extract the entities and then deployed on a test set of 150 RCTs. The entities’ performance in predicting the actual number of trial participants who were randomized was assessed and possible combinations of the entities were evaluated to create predictive models. The test set was also used to evaluate the performance of GPT-4o on the same task.

Results

The most accurate model could make predictions for 64.7% of trials in the test set, and the resulting predictions were equal to the ground truth in 93.8%. GPT-4o was able to make a prediction on 94.7% of trials and the resulting predictions were equal to the ground truth in 90.8%.

Discussion

This study presents an NER model that can extract different entities that can be used to predict the sample size from the abstract of an RCT. The entities can be combined in different ways to obtain models with different characteristics.

Conclusion

Training an NER model to predict the sample size from RCTs is feasible. Large language models can deliver similar performance without the need for prior training on the task although at a higher cost due to proprietary technology and/or required computational power.

Oxford University Press

展开收起

被引用次数：1 相关文章所有 6 个版本

以上显示的是最相近的搜索结果。查看全部搜索结果

高级搜索

QQ 群

Predicting the sample size of randomized controlled trials using natural language processing

引用