Rich document representation and classification: An analysis- 学术资源搜索

Layoutlmv2: Multi-modal pre-training for visually-rich document understanding

Y Xu, Y Xu, T Lv, L Cui, F Wei, G Wang, Y Lu… - arXiv preprint arXiv …, 2020 - arxiv.org

… To fine-tune LayoutLMv2 models on these tasks, we build a tokenlevel classification layer
above the text part of the output representations to predict the BIO tags for each entity field. …

被引用次数：430 相关文章所有 8 个版本

[PDF] thecvf.com

Selfdoc: Self-supervised document representation learning

P Li, J Gu, J Kuen, VI Morariu, H Zhao… - Proceedings of the …, 2021 - openaccess.thecvf.com

… : document entity recognition, document classification, and … to the advancement of document
analysis and intelligence by … information extraction from visually rich documents. In ACL, …

被引用次数：138 相关文章所有 8 个版本

[PDF] arxiv.org

DocBank: A benchmark dataset for document layout analysis

M Li, Y Xu, L Cui, S Huang, F Wei, Z Li… - arXiv preprint arXiv …, 2020 - arxiv.org

… The document layout analysis task is to extract the pre-defined semantic units in visually
rich documents. … We classify all the token by the type of semantic structures on a page of the …

被引用次数：167 相关文章所有 5 个版本

[PDF] arxiv.org

Specter: Document-level representation learning using citation-informed transformers

A Cohan, S Feldman, I Beltagy, D Downey… - arXiv preprint arXiv …, 2020 - arxiv.org

… A paper’s title and abstract provide rich semantic content about … For our evaluation, we
derive a document classification dataset … In this section, we analyze several design decisions in …

被引用次数：443 相关文章所有 9 个版本

[PDF] arxiv.org

A comprehensive survey on word representation models: From classical to state-of-the-art word representation language models

U Naseem, I Razzak, SK Khan, M Prasad - Transactions on Asian and …, 2021 - dl.acm.org

… imperative, given that it is rich in information and can be used … the representation of text for
low-quality text Classification and … pre-processing techniques analyzed in our study are briefly …

被引用次数：199 相关文章所有 10 个版本

[PDF] arxiv.org

Deep learning--based text classification: a comprehensive review

S Minaee, N Kalchbrenner, E Cambria… - ACM computing …, 2021 - dl.acm.org

… We provide a quantitative analysis of the performance of a selected set of DL models on 16
… -structured network typologies, to learn rich semantic representations. The authors argue that …

被引用次数：1552 相关文章所有 10 个版本

[PDF] sciencedirect.com

A comparative study of automated legal text classification using random forests and deep learning

H Chen, L Wu, J Chen, W Lu, J Ding - Information Processing & …, 2022 - Elsevier

… Based on the experimental results and analysis, we further … Even BiLSTM can capture more
comprehensive and rich … equally to the representation of the document. For example, in the …

被引用次数：120 相关文章所有 3 个版本

[PDF] arxiv.org

Document ranking with a pretrained sequence-to-sequence model

R Nogueira, Z Jiang, J Lin - arXiv preprint arXiv:2003.06713, 2020 - arxiv.org

… -rich regime, with lots of training examples, our method can outperform a pure classification-…
by “connecting” fine-tuned latent representations of relevance to related output “target words”…

被引用次数：443 相关文章所有 6 个版本

[PDF] arxiv.org

Every document owns its structure: Inductive text classification via graph neural networks

Y Zhang, X Yu, Z Cui, S Wu, Z Wen, L Wang - arXiv preprint arXiv …, 2020 - arxiv.org

… Text classification is one of the primary tasks in the NLP field, as it provides fundamental
methodologies for other NLP tasks, such as spam filtering, sentiment analysis, intent detection, …

被引用次数：289 相关文章所有 8 个版本

[PDF] arxiv.org

Layoutlmv3: Pre-training for document ai with unified text and image masking

Y Huang, T Lv, L Cui, Y Lu, F Wei - Proceedings of the 30th ACM …, 2022 - dl.acm.org

… document visual question answering, but also in image-centric tasks such as document image
classification and document layout analysis. … ) aims to learn rich visual representations via …

被引用次数：304 相关文章所有 3 个版本

高级搜索

QQ 群

Layoutlmv2: Multi-modal pre-training for visually-rich document understanding

Selfdoc: Self-supervised document representation learning

DocBank: A benchmark dataset for document layout analysis

Specter: Document-level representation learning using citation-informed transformers

A comprehensive survey on word representation models: From classical to state-of-the-art word representation language models

Deep learning--based text classification: a comprehensive review

A comparative study of automated legal text classification using random forests and deep learning

Document ranking with a pretrained sequence-to-sequence model

Every document owns its structure: Inductive text classification via graph neural networks

Layoutlmv3: Pre-training for document ai with unified text and image masking

引用