Accurate document layout analysis is a key requirement for high-quality PDF document conversion. With the recent availability of public, large ground-truth datasets such as …
Tables organize valuable content in a concise and compact representation. This content is extremely valuable for systems such as search engines, Knowledge Graph's, etc, since they …
Images depicting dark skin tones are significantly underrepresented in the educational materials used to teach primary care physicians and dermatologists to recognize skin …
Portable document format (PDF) files are one of the most universally used file types. This has incentivized hackers to develop methods to use these normally innocent PDF files to …
Artificial intelligence (AI) has become a disruptive force in many industries over the past few decades, and the subjects of material science and engineering are no exception. This …
Accurately extracting structured content from PDFs is a critical first step for NLP over scientific papers. Recent work has improved extraction accuracy by incorporating …
Abstract Foundational Models (FMs) have demonstrated unprecedented capabilities including zero-shot learning, high fidelity data synthesis, and out of domain generalization …
Extracting information from academic PDF documents is crucial for numerous indexing, retrieval, and analysis use cases. Choosing the best tool to extract specific content elements …
Transforming documents into machine-processable representations is a challenging task due to their complex structures and variability in formats. Recovering the layout structure and …