Hierarchical transformers for long document classification

R Pappagari, P Zelasko, J Villalba… - 2019 IEEE automatic …, 2019 - ieeexplore.ieee.org
BERT, which stands for Bidirectional Encoder Representations from Transformers, is a
recently introduced language representation model based upon the transfer learning …

Task-aware representation of sentences for generic text classification

K Halder, A Akbik, J Krapac… - Proceedings of the 28th …, 2020 - aclanthology.org
State-of-the-art approaches for text classification leverage a transformer architecture with a
linear layer on top that outputs a class distribution for a given prediction problem. While …

How to fine-tune bert for text classification?

C Sun, X Qiu, Y Xu, X Huang - … : 18th China national conference, CCL 2019 …, 2019 - Springer
Abstract Language model pre-training has proven to be useful in learning universal
language representations. As a state-of-the-art language model pre-training model, BERT …

A Comparison of LSTM and BERT for Small Corpus

A Ezen-Can - arXiv preprint arXiv:2009.05451, 2020 - arxiv.org
Recent advancements in the NLP field showed that transfer learning helps with achieving
state-of-the-art results for new tasks by tuning pre-trained models instead of starting from …

Revisiting transformer-based models for long document classification

X Dai, I Chalkidis, S Darkner, D Elliott - arXiv preprint arXiv:2204.06683, 2022 - arxiv.org
The recent literature in text classification is biased towards short text sequences (eg,
sentences or paragraphs). In real-world applications, multi-page multi-paragraph documents …

Fnet: Mixing tokens with fourier transforms

J Lee-Thorp, J Ainslie, I Eckstein, S Ontanon - arXiv preprint arXiv …, 2021 - arxiv.org
We show that Transformer encoder architectures can be sped up, with limited accuracy
costs, by replacing the self-attention sublayers with simple linear transformations that" mix" …

An exploration of hierarchical attention transformers for efficient long document classification

I Chalkidis, X Dai, M Fergadiotis, P Malakasiotis… - arXiv preprint arXiv …, 2022 - arxiv.org
Non-hierarchical sparse attention Transformer-based models, such as Longformer and Big
Bird, are popular approaches to working with long documents. There are clear benefits to …

What all do audio transformer models hear? probing acoustic representations for language delivery and its structure

J Shah, YK Singla, C Chen, RR Shah - arXiv preprint arXiv:2101.00387, 2021 - arxiv.org
In recent times, BERT based transformer models have become an inseparable part of
the'tech stack'of text processing models. Similar progress is being observed in the speech …

Multilingual is not enough: BERT for Finnish

A Virtanen, J Kanerva, R Ilo, J Luoma… - arXiv preprint arXiv …, 2019 - arxiv.org
Deep learning-based language models pretrained on large unannotated text corpora have
been demonstrated to allow efficient transfer learning for natural language processing, with …

Mockingjay: Unsupervised speech representation learning with deep bidirectional transformer encoders

AT Liu, S Yang, PH Chi, P Hsu… - ICASSP 2020-2020 IEEE …, 2020 - ieeexplore.ieee.org
We present Mockingjay as a new speech representation learning approach, where
bidirectional Transformer encoders are pre-trained on a large amount of unlabeled speech …