S Takase, S Kiyono - arXiv preprint arXiv:2104.06022, 2021 - arxiv.org
We propose a parameter sharing method for Transformers (Vaswani et al., 2017). The proposed approach relaxes a widely used technique, which shares parameters for one layer …
Attention mechanisms have improved the performance of NLP tasks while allowing models to remain explainable. Self-attention is currently widely used, however interpretability is …
S Takase, S Kiyono - arXiv preprint arXiv:2104.01853, 2021 - arxiv.org
We often use perturbations to regularize neural models. For neural encoder-decoders, previous studies applied the scheduled sampling (Bengio et al., 2015) and adversarial …
We investigate the rate at which algorithms for pre-training language models have improved since the advent of deep learning. Using a dataset of over 200 language model evaluations …
In recent years, the use of deep learning in language models has gained much attention. Some research projects claim that they can generate text that can be interpreted as human …
It is known that a deep neural network model pre-trained with large-scale data greatly improves the accuracy of various tasks, especially when there are resource constraints …
D Herel, T Mikolov - arXiv preprint arXiv:2312.03735, 2023 - arxiv.org
Generalization is arguably the most important goal of statistical language modeling research. Publicly available benchmarks and papers published with an open-source code …
Building accurate language models that capture meaningful long-term dependencies is a core challenge in natural language processing. Towards this end, we present a calibration …
This paper proposes a novel Recurrent Neural Network (RNN) language model that takes advantage of character information. We focus on character n-grams based on research in …