Direct output connection for a high-rank language model

X He, K Zhao, X Chu - Knowledge-based systems, 2021 - Elsevier

Deep learning (DL) techniques have obtained remarkable achievements on various tasks,
such as image recognition, object detection, and language modeling. However, building a …

被引用次数：1943 相关文章所有 8 个版本

[PDF] arxiv.org

Lessons on parameter sharing across layers in transformers

S Takase, S Kiyono - arXiv preprint arXiv:2104.06022, 2021 - arxiv.org

We propose a parameter sharing method for Transformers (Vaswani et al., 2017). The
proposed approach relaxes a widely used technique, which shares parameters for one layer …

被引用次数：83 相关文章所有 5 个版本

[PDF] arxiv.org

Rethinking self-attention: Towards interpretability in neural parsing

K Mrini, F Dernoncourt, Q Tran, T Bui, W Chang… - arXiv preprint arXiv …, 2019 - arxiv.org

Attention mechanisms have improved the performance of NLP tasks while allowing models
to remain explainable. Self-attention is currently widely used, however interpretability is …

被引用次数：108 相关文章所有 8 个版本

[PDF] arxiv.org

Rethinking perturbations in encoder-decoders for fast training

S Takase, S Kiyono - arXiv preprint arXiv:2104.01853, 2021 - arxiv.org

We often use perturbations to regularize neural models. For neural encoder-decoders,
previous studies applied the scheduled sampling (Bengio et al., 2015) and adversarial …

被引用次数：44 相关文章所有 4 个版本

[PDF] arxiv.org

Algorithmic progress in language models

A Ho, T Besiroglu, E Erdil, D Owen, R Rahman… - arXiv preprint arXiv …, 2024 - arxiv.org

We investigate the rate at which algorithms for pre-training language models have improved
since the advent of deep learning. Using a dataset of over 200 language model evaluations …

被引用次数：24 相关文章所有 4 个版本

[PDF] mdpi.com

Automated source code generation and auto-completion using deep learning: Comparing and discussing current language model-related approaches

J Cruz-Benito, S Vishwakarma, F Martin-Fernandez… - AI, 2021 - mdpi.com

In recent years, the use of deep learning in language models has gained much attention.
Some research projects claim that they can generate text that can be interpreted as human …

被引用次数：41 相关文章所有 5 个版本

[PDF] scielo.org.mx

Multi-head multi-layer attention to deep language representations for grammatical error detection

M Kaneko, M Komachi - Computación y Sistemas, 2019 - scielo.org.mx

It is known that a deep neural network model pre-trained with large-scale data greatly
improves the accuracy of various tasks, especially when there are resource constraints …

被引用次数：45 相关文章所有 11 个版本

[PDF] arxiv.org

Advancing State of the Art in Language Modeling

D Herel, T Mikolov - arXiv preprint arXiv:2312.03735, 2023 - arxiv.org

Generalization is arguably the most important goal of statistical language modeling
research. Publicly available benchmarks and papers published with an open-source code …

被引用次数：1 相关文章所有 2 个版本

[PDF] mlr.press

Calibration, entropy rates, and memory in language models

M Braverman, X Chen, S Kakade… - International …, 2020 - proceedings.mlr.press

Building accurate language models that capture meaningful long-term dependencies is a
core challenge in natural language processing. Towards this end, we present a calibration …

被引用次数：43 相关文章所有 14 个版本

[PDF] aaai.org

Character n-gram embeddings to improve RNN language models

S Takase, J Suzuki, M Nagata - Proceedings of the AAAI Conference on …, 2019 - aaai.org

This paper proposes a novel Recurrent Neural Network (RNN) language model that takes
advantage of character information. We focus on character n-grams based on research in …

被引用次数：32 相关文章所有 9 个版本

高级搜索

QQ 群