作者
Kareem Mokhtar, Syed Saqib Bukhari, Andreas Dengel
发表日期
2018/4/24
研讨会论文
2018 13th IAPR International Workshop on Document Analysis Systems (DAS)
页码范围
429-434
出版商
IEEE
简介
Although the performance of the state-of-the-art OCR systems is very high, they can still introduce errors due to various reasons, and when it comes to historical documents with old manuscripts the performance of such systems gets even worse. That is why Post-OCR error correction has been an open problem for many years. Many state-of-the-art approaches have been introduced through the recent years. This paper contributes to the field of Post-OCR Error Correction by introducing two novel deep learning approaches to improve the accuracy of OCR systems, and a post processing technique that can further enhance the quality of the output results. These approaches are based on Neural Machine Translation (NMT) and were motivated by the great success that deep learning introduced to the field of Natural Language Processing. Finally, we will compare the state-of-the-art approaches in Post-OCR Error …
学术搜索中的文章
K Mokhtar, SS Bukhari, A Dengel - 2018 13th IAPR International Workshop on Document …, 2018