查看文章

OCR Error Correction: State-of-the-Art vs an NMT-based Approach

作者

Kareem Mokhtar, Syed Saqib Bukhari, Andreas Dengel

发表日期

2018/4/24

研讨会论文

2018 13th IAPR International Workshop on Document Analysis Systems (DAS)

页码范围

429-434

出版商

IEEE

简介

Although the performance of the state-of-the-art OCR systems is very high, they can still introduce errors due to various reasons, and when it comes to historical documents with old manuscripts the performance of such systems gets even worse. That is why Post-OCR error correction has been an open problem for many years. Many state-of-the-art approaches have been introduced through the recent years. This paper contributes to the field of Post-OCR Error Correction by introducing two novel deep learning approaches to improve the accuracy of OCR systems, and a post processing technique that can further enhance the quality of the output results. These approaches are based on Neural Machine Translation (NMT) and were motivated by the great success that deep learning introduced to the field of Natural Language Processing. Finally, we will compare the state-of-the-art approaches in Post-OCR Error …

引用总数

被引用次数：48

2019202020212022202320246 7 14 8 9 4

学术搜索中的文章

OCR Error Correction: State-of-the-Art vs an NMT-based Approach

K Mokhtar, SS Bukhari, A Dengel - 2018 13th IAPR International Workshop on Document …, 2018

被引用次数：48 相关文章所有 3 个版本