相关文章- 学术资源搜索

Hierarchical transformers for long document classification

R Pappagari, P Zelasko, J Villalba… - 2019 IEEE automatic …, 2019 - ieeexplore.ieee.org

BERT, which stands for Bidirectional Encoder Representations from Transformers, is a
recently introduced language representation model based upon the transfer learning …

被引用次数：285 相关文章所有 4 个版本

[PDF] aclanthology.org

Task-aware representation of sentences for generic text classification

K Halder, A Akbik, J Krapac… - Proceedings of the 28th …, 2020 - aclanthology.org

State-of-the-art approaches for text classification leverage a transformer architecture with a
linear layer on top that outputs a class distribution for a given prediction problem. While …

被引用次数：79 相关文章所有 4 个版本

[PDF] github.io

How to fine-tune bert for text classification?

C Sun, X Qiu, Y Xu, X Huang - … : 18th China national conference, CCL 2019 …, 2019 - Springer

Abstract Language model pre-training has proven to be useful in learning universal
language representations. As a state-of-the-art language model pre-training model, BERT …

被引用次数：1889 相关文章所有 11 个版本

[PDF] arxiv.org

A Comparison of LSTM and BERT for Small Corpus

A Ezen-Can - arXiv preprint arXiv:2009.05451, 2020 - arxiv.org

Recent advancements in the NLP field showed that transfer learning helps with achieving
state-of-the-art results for new tasks by tuning pre-trained models instead of starting from …

被引用次数：138 相关文章所有 2 个版本

[PDF] arxiv.org

Revisiting transformer-based models for long document classification

X Dai, I Chalkidis, S Darkner, D Elliott - arXiv preprint arXiv:2204.06683, 2022 - arxiv.org

The recent literature in text classification is biased towards short text sequences (eg,
sentences or paragraphs). In real-world applications, multi-page multi-paragraph documents …

被引用次数：63 相关文章所有 4 个版本

[PDF] arxiv.org

Fnet: Mixing tokens with fourier transforms

J Lee-Thorp, J Ainslie, I Eckstein, S Ontanon - arXiv preprint arXiv …, 2021 - arxiv.org

We show that Transformer encoder architectures can be sped up, with limited accuracy
costs, by replacing the self-attention sublayers with simple linear transformations that" mix" …

被引用次数：448 相关文章所有 7 个版本

[PDF] arxiv.org

An exploration of hierarchical attention transformers for efficient long document classification

I Chalkidis, X Dai, M Fergadiotis, P Malakasiotis… - arXiv preprint arXiv …, 2022 - arxiv.org

Non-hierarchical sparse attention Transformer-based models, such as Longformer and Big
Bird, are popular approaches to working with long documents. There are clear benefits to …

被引用次数：22 相关文章所有 2 个版本

[PDF] arxiv.org

What all do audio transformer models hear? probing acoustic representations for language delivery and its structure

J Shah, YK Singla, C Chen, RR Shah - arXiv preprint arXiv:2101.00387, 2021 - arxiv.org

In recent times, BERT based transformer models have become an inseparable part of
the'tech stack'of text processing models. Similar progress is being observed in the speech …

被引用次数：80 相关文章所有 7 个版本

[PDF] arxiv.org

Multilingual is not enough: BERT for Finnish

A Virtanen, J Kanerva, R Ilo, J Luoma… - arXiv preprint arXiv …, 2019 - arxiv.org

Deep learning-based language models pretrained on large unannotated text corpora have
been demonstrated to allow efficient transfer learning for natural language processing, with …

被引用次数：323 相关文章所有 2 个版本

[PDF] arxiv.org

Mockingjay: Unsupervised speech representation learning with deep bidirectional transformer encoders

AT Liu, S Yang, PH Chi, P Hsu… - ICASSP 2020-2020 IEEE …, 2020 - ieeexplore.ieee.org

We present Mockingjay as a new speech representation learning approach, where
bidirectional Transformer encoders are pre-trained on a large amount of unlabeled speech …

被引用次数：419 相关文章所有 7 个版本

高级搜索

QQ 群

Hierarchical transformers for long document classification

Task-aware representation of sentences for generic text classification

How to fine-tune bert for text classification?

A Comparison of LSTM and BERT for Small Corpus

Revisiting transformer-based models for long document classification

Fnet: Mixing tokens with fourier transforms

An exploration of hierarchical attention transformers for efficient long document classification

What all do audio transformer models hear? probing acoustic representations for language delivery and its structure

Multilingual is not enough: BERT for Finnish

Mockingjay: Unsupervised speech representation learning with deep bidirectional transformer encoders

相关搜索

引用