作者
Davor Vukadin, Adrian Satja Kurdija, Goran Delač, Marin Šilić
发表日期
2021/6/9
期刊
IEEE access
卷号
9
页码范围
84559-84575
出版商
IEEE
简介
This paper proposes two natural language processing models for extracting useful information from multilingual, unstructured (free form) CV documents. The model identifies the relevant document sections (personal information, education, employment, etc.) and the corresponding specific information at the lower hierarchy level (names, addresses, roles, skill competences, etc.). Our approach employs the transformer architecture and its multilingual implementation of the encoder part in the form of the BERT language model. The models are trained and tested on a large, manually annotated CV dataset, achieving high scores on standard accuracy measures. The proposed models exhibit important properties of end-to-end training and interpretability, which was investigated by visualizing the model attention and its vector representations.
引用总数
学术搜索中的文章