作者
Sheikh Faisal Rashid, Faisal Shafait, Thomas M Breuel
发表日期
2012/3/27
研讨会论文
2012 10th IAPR International Workshop on document analysis systems
页码范围
105-109
出版商
IEEE
简介
Optical character recognition (OCR) of machine printed Latin script documents is ubiquitously claimed as a solved problem. However, error free OCR of degraded or noisy text is still challenging for modern OCR systems. Most recent approaches perform segmentation based character recognition. This is tricky because segmentation of degraded text is itself problematic. This paper describes a segmentation free text line recognition approach using multi layer perceptron (MLP) and hidden markov models (HMMs). A line scanning neural network-trained with character level contextual information and a special garbage class-is used to extract class probabilities at every pixel succession. The output of this scanning neural network is decoded by HMMs to provide character level recognition. In evaluations on a subset of UNLV-ISRI document collection, we achieve 98.4% character recognition accuracy that is statistically …
引用总数
201320142015201620172018201920202021202220234248615343
学术搜索中的文章
SF Rashid, F Shafait, TM Breuel - 2012 10th IAPR International Workshop on document …, 2012