Chinese medical concept normalization by using text and comorbidity network embedding

Y Zhang, X Ma, G Song - 2018 IEEE international conference …, 2018 - ieeexplore.ieee.org
2018 IEEE international conference on data mining (ICDM), 2018ieeexplore.ieee.org
Chinese medical concept normalization, which maps non-standard medical concepts to
standard expressions, is a NLP task with wide-ranging applications in medical big data
research and clinical statistic. Many previous works apply supervised methods which require
a lot of annotated data. However, they can not address the challenge brought by the high
cost of medical data annotation, which requires sufficient professional knowledge and
experience. Meanwhile, existing unsupervised methods perform poorly facing the various …
Chinese medical concept normalization, which maps non-standard medical concepts to standard expressions, is a NLP task with wide-ranging applications in medical big data research and clinical statistic. Many previous works apply supervised methods which require a lot of annotated data. However, they can not address the challenge brought by the high cost of medical data annotation, which requires sufficient professional knowledge and experience. Meanwhile, existing unsupervised methods perform poorly facing the various non-standard expression from different data sources. In this paper, we propose DUNE, Disease Unsupervised Normalization by Embedding, an unsupervised Chinese medical concept normalization framework by applying denoising auto-encoder (DAE) and network embedding. We formulate this task as finding mention-entity pairs with great text and comorbidity similarity. To handle the noise in text, we design a multi-view attention based denoising auto-encoder (MADAE) to capture text information from multiple views, reduce the influence of noise, and transform text to denoised vectors. To introduce comorbidity information, we construct a comorbidity network with both standard and non-standard disease names as nodes from medical records. Because of the diversity of nonstandard expressions, one disease perhaps corresponds to several different nodes, which causes noise in comorbidity network. To handle such network structure noise, we propose a denoising network embedding framework, which reduce the structure noise with the help of text information, to embed the nodes to vectors for comorbidity similarity measurement. Convincing experiment results show that our method performs better than existing unsupervised baselines and approaches the performance of classical supervised machine learning model on this task.
ieeexplore.ieee.org
以上显示的是最相近的搜索结果。 查看全部搜索结果

Google学术搜索按钮

example.edu/paper.pdf
搜索
获取 PDF 文件
引用
References