查看文章

ieee.org 中的 [PDF]

Information extraction from free-form CV documents in multiple languages

作者

Davor Vukadin, Adrian Satja Kurdija, Goran Delač, Marin Šilić

发表日期

2021/6/9

期刊

IEEE access

卷号

页码范围

84559-84575

出版商

IEEE

简介

This paper proposes two natural language processing models for extracting useful information from multilingual, unstructured (free form) CV documents. The model identifies the relevant document sections (personal information, education, employment, etc.) and the corresponding specific information at the lower hierarchy level (names, addresses, roles, skill competences, etc.). Our approach employs the transformer architecture and its multilingual implementation of the encoder part in the form of the BERT language model. The models are trained and tested on a large, manually annotated CV dataset, achieving high scores on standard accuracy measures. The proposed models exhibit important properties of end-to-end training and interpretability, which was investigated by visualizing the model attention and its vector representations.

引用总数

被引用次数：17

2022202320243 9 5

学术搜索中的文章

Information extraction from free-form CV documents in multiple languages

D Vukadin, AS Kurdija, G Delač, M Šilić - IEEE access, 2021

被引用次数：17 相关文章所有 4 个版本