Layoutlmv2: Multi-modal pre-training for visually-rich document understanding

Y Xu, Y Xu, T Lv, L Cui, F Wei, G Wang, Y Lu… - arXiv preprint arXiv …, 2020 - arxiv.org
… To fine-tune LayoutLMv2 models on these tasks, we build a tokenlevel classification layer
above the text part of the output representations to predict the BIO tags for each entity field. …

Selfdoc: Self-supervised document representation learning

P Li, J Gu, J Kuen, VI Morariu, H Zhao… - Proceedings of the …, 2021 - openaccess.thecvf.com
… : document entity recognition, document classification, and … to the advancement of document
analysis and intelligence by … information extraction from visually rich documents. In ACL, …

DocBank: A benchmark dataset for document layout analysis

M Li, Y Xu, L Cui, S Huang, F Wei, Z Li… - arXiv preprint arXiv …, 2020 - arxiv.org
… The document layout analysis task is to extract the pre-defined semantic units in visually
rich documents. … We classify all the token by the type of semantic structures on a page of the …

Specter: Document-level representation learning using citation-informed transformers

A Cohan, S Feldman, I Beltagy, D Downey… - arXiv preprint arXiv …, 2020 - arxiv.org
… A paper’s title and abstract provide rich semantic content about … For our evaluation, we
derive a document classification dataset … In this section, we analyze several design decisions in …

A comprehensive survey on word representation models: From classical to state-of-the-art word representation language models

U Naseem, I Razzak, SK Khan, M Prasad - Transactions on Asian and …, 2021 - dl.acm.org
… imperative, given that it is rich in information and can be used … the representation of text for
low-quality text Classification and … pre-processing techniques analyzed in our study are briefly …

Deep learning--based text classification: a comprehensive review

S Minaee, N Kalchbrenner, E Cambria… - ACM computing …, 2021 - dl.acm.org
… We provide a quantitative analysis of the performance of a selected set of DL models on 16
… -structured network typologies, to learn rich semantic representations. The authors argue that …

A comparative study of automated legal text classification using random forests and deep learning

H Chen, L Wu, J Chen, W Lu, J Ding - Information Processing & …, 2022 - Elsevier
… Based on the experimental results and analysis, we further … Even BiLSTM can capture more
comprehensive and rich … equally to the representation of the document. For example, in the …

Document ranking with a pretrained sequence-to-sequence model

R Nogueira, Z Jiang, J Lin - arXiv preprint arXiv:2003.06713, 2020 - arxiv.org
… -rich regime, with lots of training examples, our method can outperform a pure classification-…
by “connecting” fine-tuned latent representations of relevance to related output “target words”…

Every document owns its structure: Inductive text classification via graph neural networks

Y Zhang, X Yu, Z Cui, S Wu, Z Wen, L Wang - arXiv preprint arXiv …, 2020 - arxiv.org
… Text classification is one of the primary tasks in the NLP field, as it provides fundamental
methodologies for other NLP tasks, such as spam filtering, sentiment analysis, intent detection, …

Layoutlmv3: Pre-training for document ai with unified text and image masking

Y Huang, T Lv, L Cui, Y Lu, F Wei - Proceedings of the 30th ACM …, 2022 - dl.acm.org
document visual question answering, but also in image-centric tasks such as document image
classification and document layout analysis. … ) aims to learn rich visual representations via …