State-of-the-art approaches for text classification leverage a transformer architecture with a linear layer on top that outputs a class distribution for a given prediction problem. While …
C Sun, X Qiu, Y Xu, X Huang - … : 18th China national conference, CCL 2019 …, 2019 - Springer
Abstract Language model pre-training has proven to be useful in learning universal language representations. As a state-of-the-art language model pre-training model, BERT …
A Ezen-Can - arXiv preprint arXiv:2009.05451, 2020 - arxiv.org
Recent advancements in the NLP field showed that transfer learning helps with achieving state-of-the-art results for new tasks by tuning pre-trained models instead of starting from …
The recent literature in text classification is biased towards short text sequences (eg, sentences or paragraphs). In real-world applications, multi-page multi-paragraph documents …
We show that Transformer encoder architectures can be sped up, with limited accuracy costs, by replacing the self-attention sublayers with simple linear transformations that" mix" …
Non-hierarchical sparse attention Transformer-based models, such as Longformer and Big Bird, are popular approaches to working with long documents. There are clear benefits to …
In recent times, BERT based transformer models have become an inseparable part of the'tech stack'of text processing models. Similar progress is being observed in the speech …
A Virtanen, J Kanerva, R Ilo, J Luoma… - arXiv preprint arXiv …, 2019 - arxiv.org
Deep learning-based language models pretrained on large unannotated text corpora have been demonstrated to allow efficient transfer learning for natural language processing, with …
We present Mockingjay as a new speech representation learning approach, where bidirectional Transformer encoders are pre-trained on a large amount of unlabeled speech …