Scanbank: A benchmark dataset for figure extraction from scanned electronic theses and dissertations

SY Kahu, WA Ingram, EA Fox, J Wu - arXiv preprint arXiv:2106.15320, 2021 - arxiv.org
We focus on electronic theses and dissertations (ETDs), aiming to improve access and
expand their utility, since more than 6 million are publicly available, and they constitute an …

Figure extraction from scanned electronic theses and dissertations

SY Kahu - 2020 - vtechworks.lib.vt.edu
The ability to extract figures and tables from scientific documents can solve key use-cases
such as their semantic parsing, summarization, or indexing. Although a few methods have …

Extracting scientific figures with distantly supervised neural networks

N Siegel, N Lourie, R Power, W Ammar - … of the 18th ACM/IEEE on joint …, 2018 - dl.acm.org
Non-textual components such as charts, diagrams and tables provide key information in
many scientific documents, but the lack of large labeled datasets has impeded the …

Automatic extraction of figures from scholarly documents

S Ray Choudhury, P Mitra, CL Giles - … of the 2015 ACM symposium on …, 2015 - dl.acm.org
Scholarly papers (journal and conference papers, technical reports, etc.) usually contain
multiple``figures''such as plots, flow charts and other images which are generated manually …

Unveiling Document Structures with YOLOv5 Layout Detection

H Sugiharto, Y Silviana, YS Nurpazrin - arXiv preprint arXiv:2309.17033, 2023 - arxiv.org
The current digital environment is characterized by the widespread presence of data,
particularly unstructured data, which poses many issues in sectors including finance …

Parsing electronic theses and dissertations using object detection

A Ahuja, A Devera, EA Fox - Proceedings of the first Workshop on …, 2022 - aclanthology.org
Electronic theses and dissertations (ETDs) contain valuable knowledge that can be useful
for a wide range of purposes. To effectively utilize the knowledge contained in ETDs for …

Fast and accurate deep learning model for stamps detection for embedded devices

A Gayer, D Ershova, V Arlazarov - Pattern Recognition and Image Analysis, 2022 - Springer
The search for stamps on images is necessary to verify the authenticity of a document and
extract valuable textual information contained in them. Despite the vast number of methods …

A heuristic baseline method for metadata extraction from scanned electronic theses and dissertations

MH Choudhury, J Wu, WA Ingram, EA Fox - Proceedings of the ACM …, 2020 - dl.acm.org
Extracting metadata from scholarly papers is an important text mining problem. Widely used
open-source tools such as GROBID are designed for born-digital scholarly papers but often …

Convolutional neural networks for figure extraction in historical technical documents

CN Yu, CC Levy, I Saniee - 2017 14th IAPR International …, 2017 - ieeexplore.ieee.org
We present a method of extracting figures and images from the pages of scanned
documents, especially from technical research articles. Our approach is novel in two key …

DocReader: bounding-box free training of a document information extraction model

S Klaiman, M Lehne - Document Analysis and Recognition–ICDAR 2021 …, 2021 - Springer
Abstract Information extraction from documents is a ubiquitous first step in many business
applications. During this step, the entries of various fields must first be read from the images …