查看文章

cnrs.fr 中的 [PDF]

anyocr: A sequence learning based ocr system for unlabeled historical documents

作者

Martin Jenckel, Syed Saqib Bukhari, Andreas Dengel

发表日期

2016/12/4

研讨会论文

2016 23rd International Conference on Pattern Recognition (ICPR)

页码范围

4035-4040

出版商

IEEE

简介

Institutes and libraries around the globe are preserving the literary heritage by digitizing historical documents. However, to make this data easily accessible the scanned documents need to be transformed into search-able text. State of the art OCR systems using Long-Short-Term-Memory networks (LSTM) have been applied successfully to recognize text in both printed and handwritten form. Besides the general challenges with historical documents, e.g. poor image quality, damaged characters, etc., especially unknown scripts and old fonds make it difficult to provide the large amount of transcribed training data required for these methods to perform well. Transcribing the documents manually is very costly in terms of man-hours and require language specific expertise. The unknown fonds and requirement for meaningful context also make the use of synthetic data unfeasible. We therefore propose an end-to-end …

引用总数

被引用次数：37

2016201720182019202020212022202320241 4 12 7 2 6 3 1

学术搜索中的文章

anyocr: A sequence learning based ocr system for unlabeled historical documents

M Jenckel, SS Bukhari, A Dengel - 2016 23rd International Conference on Pattern …, 2016

被引用次数：37 相关文章所有 2 个版本