Analysis methods in neural language processing: A survey

Y Belinkov, J Glass - … of the Association for Computational Linguistics, 2019 - direct.mit.edu
The field of natural language processing has seen impressive progress in recent years, with
neural network models replacing many of the traditional systems. A plethora of new models …

[PDF][PDF] Linguistic Knowledge and Transferability of Contextual Representations

NF Liu - arXiv preprint arXiv:1903.08855, 2019 - fq.pkwyx.com
Contextual word representations derived from large-scale neural language models are
successful across a diverse set of NLP tasks, suggesting that they encode useful and …

Don't give me the details, just the summary! topic-aware convolutional neural networks for extreme summarization

S Narayan, SB Cohen, M Lapata - arXiv preprint arXiv:1808.08745, 2018 - arxiv.org
We introduce extreme summarization, a new single-document summarization task which
does not favor extractive strategies and calls for an abstractive modeling approach. The idea …

Understanding neural networks through representation erasure

J Li, W Monroe, D Jurafsky - arXiv preprint arXiv:1612.08220, 2016 - arxiv.org
While neural networks have been successfully applied to many natural language processing
tasks, they come at the cost of interpretability. In this paper, we propose a general …

Identifying and controlling important neurons in neural machine translation

A Bau, Y Belinkov, H Sajjad, N Durrani, F Dalvi… - arXiv preprint arXiv …, 2018 - arxiv.org
Neural machine translation (NMT) models learn representations containing substantial
linguistic information. However, it is not clear if such information is fully distributed or if some …

What is one grain of sand in the desert? analyzing individual neurons in deep nlp models

F Dalvi, N Durrani, H Sajjad, Y Belinkov, A Bau… - Proceedings of the AAAI …, 2019 - aaai.org
Despite the remarkable evolution of deep neural networks in natural language processing
(NLP), their interpretability remains a challenge. Previous work largely focused on what …

Curriculum learning and minibatch bucketing in neural machine translation

T Kocmi, O Bojar - arXiv preprint arXiv:1707.09533, 2017 - arxiv.org
We examine the effects of particular orderings of sentence pairs on the on-line training of
neural machine translation (NMT). We focus on two types of such orderings:(1) ensuring that …

Analyzing redundancy in pretrained transformer models

F Dalvi, H Sajjad, N Durrani, Y Belinkov - arXiv preprint arXiv:2004.04010, 2020 - arxiv.org
Transformer-based deep NLP models are trained using hundreds of millions of parameters,
limiting their applicability in computationally constrained environments. In this paper, we …

Similarity analysis of contextual word representation models

JM Wu, Y Belinkov, H Sajjad, N Durrani, F Dalvi… - arXiv preprint arXiv …, 2020 - arxiv.org
This paper investigates contextual word representation models from the lens of similarity
analysis. Given a collection of trained models, we measure the similarity of their internal …

Breaking the beam search curse: A study of (re-) scoring methods and stopping criteria for neural machine translation

Y Yang, L Huang, M Ma - arXiv preprint arXiv:1808.09582, 2018 - arxiv.org
Beam search is widely used in neural machine translation, and usually improves translation
quality compared to greedy search. It has been widely observed that, however, beam sizes …