作者
Hanna Berg, Hercules Dalianis
发表日期
2019/9/30
期刊
Proceedings of the Workshop on NLP and Pseudonymisation
页码范围
8-15
简介
Electronic patient records are produced in abundance every day and there is a demand to use them for research or management purposes. The records, however, contain information in the free text that can identify the patient and therefore tools are needed to identify this sensitive information.
The aim is to compare two machine learning algorithms, Long Short-Term Memory (LSTM) and Conditional Random Fields (CRF) applied to a Swedish clinical data set annotated for de-identification. The results show that CRF performs better than deep learning with LSTM, with CRF giving the best results with an F1 score of 0.91 when adding more data from within the same domain. Adding general open data did, on the other hand, not improve the results.
引用总数
201920202021202220232024153311