Document layout analysis: a comprehensive survey

GM Binmakhashen, SA Mahmoud - ACM Computing Surveys (CSUR), 2019 - dl.acm.org
Document layout analysis (DLA) is a preprocessing step of document understanding
systems. It is responsible for detecting and annotating the physical structure of documents …

A comprehensive survey of mostly textual document segmentation algorithms since 2008

S Eskenazi, P Gomez-Krämer, JM Ogier - Pattern recognition, 2017 - Elsevier
In document image analysis, segmentation is the task that identifies the regions of a
document. The increasing number of applications of document analysis requires a good …

DocBank: A benchmark dataset for document layout analysis

M Li, Y Xu, L Cui, S Huang, F Wei, Z Li… - arXiv preprint arXiv …, 2020 - arxiv.org
Document layout analysis usually relies on computer vision models to understand
documents while ignoring textual information that is vital to capture. Meanwhile, high quality …

[PDF][PDF] Fast CNN-based document layout analysis

DAB Oliveira, MP Viana - 2017 IEEE International …, 2017 - openaccess.thecvf.com
Automatic document layout analysis is a crucial step in cognitive computing and processes
that extract information out of document images, such as specific-domain knowledge …

Text and non-text separation in offline document images: a survey

S Bhowmik, R Sarkar, M Nasipuri… - International Journal on …, 2018 - Springer
Separation of text and non-text is an essential processing step for any document analysis
system. Therefore, it is important to have a clear understanding of the state-of-the-art of …

Layout analysis for arabic historical document images using machine learning

SS Bukhari, TM Breuel, A Asi… - … conference on frontiers …, 2012 - ieeexplore.ieee.org
Page layout analysis is a fundamental step of any document image understanding system.
We introduce an approach that segments text appearing in page margins (aka side-notes …

Printer identification using supervised learning for document forgery detection

S Elkasrawi, F Shafait - 2014 11th IAPR International Workshop …, 2014 - ieeexplore.ieee.org
Identifying the source printer of a document is important in forgery detection. The larger the
number of documents to be investigated for forgery, the less time-efficient manual …

Text and non-text segmentation based on connected component features

VP Le, N Nayef, M Visani, JM Ogier… - … on document analysis …, 2015 - ieeexplore.ieee.org
Document image segmentation is crucial to OCR and other digitization processes. In this
paper, we present a learning-based approach for text and non-text separation in document …

Improved document image segmentation algorithm using multiresolution morphology

SS Bukhari, F Shafait, TM Breuel - Document recognition and …, 2011 - spiedigitallibrary.org
Page segmentation into text and non-text elements is an essential preprocessing step
before optical character recognition (OCR) operation. In case of poor segmentation, an OCR …

BINYAS: a complex document layout analysis system

S Bhowmik, S Kundu, R Sarkar - Multimedia Tools and Applications, 2021 - Springer
Document layout analysis (DLA) is an irreplaceable pre-requisite for the development of a
comprehensive document image processing and analysis system. The main purpose of DLA …