Extracting bibliographical data for PDF documents with HMM and external resources

WF Hsiao, TM Chang, E Thomas - Program, 2014 - emerald.com
Purpose–The purpose of this paper is to propose an automatic metadata extraction and
retrieval system to extract bibliographical information from digital academic documents in …

Bootstrapping multilingual metadata extraction: a showcase in cyrillic

J Krause, I Shapiro, T Saier… - Proceedings of the Second …, 2021 - aclanthology.org
Applications based on scholarly data are of ever increasing importance. This results in
disadvantages for areas where high-quality data and compatible systems are not available …

New methods for metadata extraction from scientific literature

D Tkaczyk - arXiv preprint arXiv:1710.10201, 2017 - arxiv.org
Within the past few decades we have witnessed digital revolution, which moved scholarly
communication to electronic media and also resulted in a substantial increase in its volume …

GROBID: Combining automatic bibliographic data recognition and term extraction for scholarship publications

P Lopez - Research and Advanced Technology for Digital …, 2009 - Springer
Based on state of the art machine learning techniques, GROBID (GeneRation Of
BIbliographic Data) performs reliable bibliographic data extractions from scholar articles …

A benchmark of pdf information extraction tools using a multi-task and multi-domain evaluation framework for academic documents

N Meuschke, A Jagdale, T Spinde, J Mitrović… - International Conference …, 2023 - Springer
Extracting information from academic PDF documents is crucial for numerous indexing,
retrieval, and analysis use cases. Choosing the best tool to extract specific content elements …

[PDF][PDF] An End-to-End Pipeline for Bibliography Extraction from Scientific Articles

B Joshi, A Symeonidou, SM Danish… - Proceedings of the …, 2023 - aclanthology.org
We introduce a comprehensive end-to-end pipeline designed to extract complete
bibliography section from English scientific articles in digital-born PDF format and further …

OCR++: a robust framework for information extraction from scholarly articles

M Singh, B Barua, P Palod, M Garg… - arXiv preprint arXiv …, 2016 - arxiv.org
This paper proposes OCR++, an open-source framework designed for a variety of
information extraction tasks from scholarly articles including metadata (title, author names …

[HTML][HTML] Building an annotated corpus for automatic metadata extraction from multilingual journal article references

W Choi, HM Yoon, MH Hyun, HJ Lee, JW Seol, KD Lee… - PloS one, 2023 - journals.plos.org
Bibliographic references containing citation information of academic literature play an
important role as a medium connecting earlier and recent studies. As references contain …

Metadata extraction from bibliographies using bigram HMM

P Yin, M Zhang, ZH Deng, DQ Yang - Digital Libraries: International …, 2005 - Springer
In recent years, we have seen huge volumes of research papers available on the World
Wide Web. Metadata provides a good approach for organizing and retrieving these useful …

Structured references from pdf articles: assessing the tools for bibliographic reference extraction and parsing

A Cioffi, S Peroni - International Conference on Theory and Practice of …, 2022 - Springer
Many solutions have been provided to extract bibliographic references from PDF papers.
Machine learning, rule-based and regular expressions approaches were among the most …