An overview on language models: Recent developments and outlook

C Wei, YC Wang, B Wang, CCJ Kuo - arXiv preprint arXiv:2303.05759, 2023 - arxiv.org
Language modeling studies the probability distributions over strings of texts. It is one of the
most fundamental tasks in natural language processing (NLP). It has been widely used in …

On the dangers of stochastic parrots: Can language models be too big?🦜

EM Bender, T Gebru, A McMillan-Major… - Proceedings of the 2021 …, 2021 - dl.acm.org
The past 3 years of work in NLP have been characterized by the development and
deployment of ever larger language models, especially for English. BERT, its variants, GPT …

Masked language model scoring

J Salazar, D Liang, TQ Nguyen, K Kirchhoff - arXiv preprint arXiv …, 2019 - arxiv.org
Pretrained masked language models (MLMs) require finetuning for most NLP tasks. Instead,
we evaluate MLMs out of the box via their pseudo-log-likelihood scores (PLLs), which are …

Hyporadise: An open baseline for generative speech recognition with large language models

C Chen, Y Hu, CHH Yang… - Advances in …, 2024 - proceedings.neurips.cc
Advancements in deep neural networks have allowed automatic speech recognition (ASR)
systems to attain human parity on several publicly available clean speech datasets …

Pre-training transformers as energy-based cloze models

K Clark, MT Luong, QV Le, CD Manning - arXiv preprint arXiv:2012.08561, 2020 - arxiv.org
We introduce Electric, an energy-based cloze model for representation learning over text.
Like BERT, it is a conditional generative model of tokens given their contexts. However …

With a little help from my temporal context: Multimodal egocentric action recognition

E Kazakos, J Huh, A Nagrani, A Zisserman… - arXiv preprint arXiv …, 2021 - arxiv.org
In egocentric videos, actions occur in quick succession. We capitalise on the action's
temporal context and propose a method that learns to attend to surrounding actions in order …

Adapting GPT, GPT-2 and BERT language models for speech recognition

X Zheng, C Zhang, PC Woodland - 2021 IEEE Automatic …, 2021 - ieeexplore.ieee.org
Language models (LMs) pre-trained on massive amounts of text, in particular bidirectional
encoder representations from Transformers (BERT), generative pre-training (GPT), and GPT …

Whispering LLaMA: A cross-modal generative error correction framework for speech recognition

S Radhakrishnan, CHH Yang, SA Khan… - arXiv preprint arXiv …, 2023 - arxiv.org
We introduce a new cross-modal fusion technique designed for generative error correction
in automatic speech recognition (ASR). Our methodology leverages both acoustic …

Causal mediation analysis for interpreting neural nlp: The case of gender bias

J Vig, S Gehrmann, Y Belinkov, S Qian, D Nevo… - arXiv preprint arXiv …, 2020 - arxiv.org
Common methods for interpreting neural models in natural language processing typically
examine either their structure or their behavior, but not both. We propose a methodology …

Fast end-to-end speech recognition via non-autoregressive models and cross-modal knowledge transferring from BERT

Y Bai, J Yi, J Tao, Z Tian, Z Wen… - IEEE/ACM Transactions …, 2021 - ieeexplore.ieee.org
Attention-based encoder-decoder (AED) models have achieved promising performance in
speech recognition. However, because the decoder predicts text tokens (such as characters …