Survey of post-OCR processing approaches

TTH Nguyen, A Jatowt, M Coustaty… - ACM Computing Surveys …, 2021 - dl.acm.org
Optical character recognition (OCR) is one of the most popular techniques used for
converting printed documents into machine-readable ones. While OCR engines can do well …

Ocr post-processing error correction algorithm using google online spelling suggestion

Y Bassil, M Alwani - arXiv preprint arXiv:1204.0191, 2012 - arxiv.org
With the advent of digital optical scanners, a lot of paper-based books, textbooks,
magazines, articles, and documents are being transformed into an electronic version that …

An OCR post-correction approach using deep learning for processing medical reports

S Karthikeyan, AGS de Herrera… - IEEE Transactions on …, 2021 - ieeexplore.ieee.org
According to a recent Deloitte study, the COVID-19 pandemic continues to place a huge
strain on the global health care sector. Covid-19 has also catalysed digital transformation …

Ocr context-sensitive error correction based on google web 1t 5-gram data set

Y Bassil, M Alwani - arXiv preprint arXiv:1204.0188, 2012 - arxiv.org
Since the dawn of the computing era, information has been represented digitally so that it
can be processed by electronic computers. Paper books and documents were abundant and …

Generating a training corpus for OCR post-correction using encoder-decoder model

E D'hondt, C Grouin, B Grau - Proceedings of the Eighth …, 2017 - aclanthology.org
In this paper we present a novel approach to the automatic correction of OCR-induced
orthographic errors in a given text. While current systems depend heavily on large training …

A weighted finite-state framework for correcting errors in natural scene OCR

R Beaufort, C Mancas-Thillou - Ninth International Conference …, 2007 - ieeexplore.ieee.org
With the increasing market of cheap cameras, natural scene text has to be handled in an
efficient way. Some works deal with text detection in the image while more recent ones point …

Toward the optimized crowdsourcing strategy for OCR post-correction

O Suissa, A Elmalech… - Aslib Journal of …, 2020 - emerald.com
Purpose Digitization of historical documents is a challenging task in many digital humanities
projects. A popular approach for digitization is to scan the documents into images, and then …

A multi-stage approach to Arabic document analysis

E Borovikov, I Zavorin - Guide to OCR for Arabic scripts, 2012 - Springer
We approach the analysis of electronic documents as a multi-stage process, which we
implement via a multi-filter document processing framework that provides (a) flexibility for …

A hardware-based surveillance video camera watermark

R van Schyndel - … on Digital Image Computing: Techniques and …, 2010 - ieeexplore.ieee.org
This paper arose out of a need for marking surveillance video in a simple manner that would
allow the integrity of that video against later manipulation to be assured from the camera to …

A multi-evidence, multi-engine OCR system

I Zavorin, E Borovikov, A Borovikov… - … and Retrieval XIV, 2007 - spiedigitallibrary.org
Although modern OCR technology is capable of handling a wide variety of document
images, there is no single OCR engine that performs equally well on all documents for a …