作者
Syed Saqib Bukhari, Faisal Shafait, Thomas M Breuel
发表日期
2011/1/24
研讨会论文
Document recognition and retrieval XVIII
卷号
7874
页码范围
109-116
出版商
SPIE
简介
Page segmentation into text and non-text elements is an essential preprocessing step before optical character recognition (OCR) operation. In case of poor segmentation, an OCR classification engine produces garbage characters due to the presence of non-text elements. This paper describes modifications to the text/non-text segmentation algorithm presented by Bloomberg,1 which is also available in his open-source Leptonica library.2The modifications result in significant improvements and achieved better segmentation accuracy than the original algorithm for UW-III, UNLV, ICDAR 2009 page segmentation competition test images and circuit diagram datasets.
引用总数
201120122013201420152016201720182019202020212022202320242644116107657282
学术搜索中的文章