Visually-Rich Document Understanding: Concepts, Taxonomy and Challenges

A Sassioui, R Benouini, Y El Ouargui… - … Networks and Mobile …, 2023 - ieeexplore.ieee.org
A Sassioui, R Benouini, Y El Ouargui, M El Kamili, M Chergui, M Ouzzif
2023 10th International Conference on Wireless Networks and Mobile …, 2023ieeexplore.ieee.org
The increasing prevalence of Visually-rich Documents (VRDs) in diverse domains has led to
a growing interest in Visually-rich Document Understanding (VrDU). Researchers have
been developing sophisticated systems to extract explicit and implicit information from such
documents. However, a comprehensive literature review on this field still missing. In this
paper, we present a comprehensive overview of VrDU systems, focusing on their
applications in Key Information Extraction (KIE), Visual Question Answering (VQA) and …
The increasing prevalence of Visually-rich Documents (VRDs) in diverse domains has led to a growing interest in Visually-rich Document Understanding (VrDU). Researchers have been developing sophisticated systems to extract explicit and implicit information from such documents. However, a comprehensive literature review on this field still missing. In this paper, we present a comprehensive overview of VrDU systems, focusing on their applications in Key Information Extraction (KIE), Visual Question Answering (VQA) and document classification tasks. We delve into the fundamental concepts and techniques used in VrDU, providing a high-level abstraction of these approaches. To facilitate understanding and comparison, we propose a taxonomy that categorizes VrDU systems based on four key aspects: modality, geometric approach, OCR integration type and task. Additionally, we present the benchmarks commonly used in the field and discuss the current challenges faced by VrDU systems. We believe that this paper advances the research in the field of VrDU by providing a comprehensive resource for researchers and practitioners
ieeexplore.ieee.org
以上显示的是最相近的搜索结果。 查看全部搜索结果