AutoML: A survey of the state-of-the-art

X He, K Zhao, X Chu - Knowledge-based systems, 2021 - Elsevier
Deep learning (DL) techniques have obtained remarkable achievements on various tasks,
such as image recognition, object detection, and language modeling. However, building a …

Lessons on parameter sharing across layers in transformers

S Takase, S Kiyono - arXiv preprint arXiv:2104.06022, 2021 - arxiv.org
We propose a parameter sharing method for Transformers (Vaswani et al., 2017). The
proposed approach relaxes a widely used technique, which shares parameters for one layer …

Rethinking self-attention: Towards interpretability in neural parsing

K Mrini, F Dernoncourt, Q Tran, T Bui, W Chang… - arXiv preprint arXiv …, 2019 - arxiv.org
Attention mechanisms have improved the performance of NLP tasks while allowing models
to remain explainable. Self-attention is currently widely used, however interpretability is …

Rethinking perturbations in encoder-decoders for fast training

S Takase, S Kiyono - arXiv preprint arXiv:2104.01853, 2021 - arxiv.org
We often use perturbations to regularize neural models. For neural encoder-decoders,
previous studies applied the scheduled sampling (Bengio et al., 2015) and adversarial …

Algorithmic progress in language models

A Ho, T Besiroglu, E Erdil, D Owen, R Rahman… - arXiv preprint arXiv …, 2024 - arxiv.org
We investigate the rate at which algorithms for pre-training language models have improved
since the advent of deep learning. Using a dataset of over 200 language model evaluations …

Automated source code generation and auto-completion using deep learning: Comparing and discussing current language model-related approaches

J Cruz-Benito, S Vishwakarma, F Martin-Fernandez… - AI, 2021 - mdpi.com
In recent years, the use of deep learning in language models has gained much attention.
Some research projects claim that they can generate text that can be interpreted as human …

Multi-head multi-layer attention to deep language representations for grammatical error detection

M Kaneko, M Komachi - Computación y Sistemas, 2019 - scielo.org.mx
It is known that a deep neural network model pre-trained with large-scale data greatly
improves the accuracy of various tasks, especially when there are resource constraints …

Advancing State of the Art in Language Modeling

D Herel, T Mikolov - arXiv preprint arXiv:2312.03735, 2023 - arxiv.org
Generalization is arguably the most important goal of statistical language modeling
research. Publicly available benchmarks and papers published with an open-source code …

Calibration, entropy rates, and memory in language models

M Braverman, X Chen, S Kakade… - International …, 2020 - proceedings.mlr.press
Building accurate language models that capture meaningful long-term dependencies is a
core challenge in natural language processing. Towards this end, we present a calibration …

Character n-gram embeddings to improve RNN language models

S Takase, J Suzuki, M Nagata - Proceedings of the AAAI Conference on …, 2019 - aaai.org
This paper proposes a novel Recurrent Neural Network (RNN) language model that takes
advantage of character information. We focus on character n-grams based on research in …