TreeMix: Compositional constituency-based data augmentation for natural language understanding

L Zhang, Z Yang, D Yang - arXiv preprint arXiv:2205.06153, 2022 - arxiv.org
Data augmentation is an effective approach to tackle over-fitting. Many previous works have
proposed different data augmentations strategies for NLP, such as noise injection, word …

Substructure substitution: Structured data augmentation for NLP

H Shi, K Livescu, K Gimpel - arXiv preprint arXiv:2101.00411, 2021 - arxiv.org
We study a family of data augmentation methods, substructure substitution (SUB2), for
natural language processing (NLP) tasks. SUB2 generates new examples by substituting …

Audio-visual neural syntax acquisition

CIJ Lai, F Shi, P Peng, Y Kim, K Gimpel… - 2023 IEEE Automatic …, 2023 - ieeexplore.ieee.org
We study phrase structure induction from visually-grounded speech. The core idea is to first
segment the speech waveform into sequences of word segments, and subsequently induce …

Unsupervised chunking with hierarchical RNN

Z Wu, AA Deshmukh, Y Wu, J Lin, L Mou - arXiv preprint arXiv:2309.04919, 2023 - arxiv.org
In Natural Language Processing (NLP), predicting linguistic structures, such as parsing and
chunking, has mostly relied on manual annotations of syntactic structures. This paper …

PCFGs can do better: Inducing probabilistic context-free grammars with many symbols

S Yang, Y Zhao, K Tu - arXiv preprint arXiv:2104.13727, 2021 - arxiv.org
Probabilistic context-free grammars (PCFGs) with neural parameterization have been shown
to be effective in unsupervised phrase-structure grammar induction. However, due to the …

Heads-up! unsupervised constituency parsing via self-attention heads

B Li, T Kim, RK Amplayo, F Keller - arXiv preprint arXiv:2010.09517, 2020 - arxiv.org
Transformer-based pre-trained language models (PLMs) have dramatically improved the
state of the art in NLP across many tasks. This has led to substantial interest in analyzing the …

Data augmentation for machine translation via dependency subtree swapping

A Nagy, DP Lakatos, B Barta, P Nanys, J Ács - arXiv preprint arXiv …, 2023 - arxiv.org
We present a generic framework for data augmentation via dependency subtree swapping
that is applicable to machine translation. We extract corresponding subtrees from the …

Learning a grammar inducer from massive uncurated instructional videos

S Zhang, L Song, L Jin, H Mi, K Xu, D Yu… - arXiv preprint arXiv …, 2022 - arxiv.org
Video-aided grammar induction aims to leverage video information for finding more accurate
syntactic grammars for accompanying text. While previous work focuses on building systems …

Revisiting the practical effectiveness of constituency parse extraction from pre-trained language models

T Kim - arXiv preprint arXiv:2211.00479, 2022 - arxiv.org
Constituency Parse Extraction from Pre-trained Language Models (CPE-PLM) is a recent
paradigm that attempts to induce constituency parse trees relying only on the internal …

Unsupervised discontinuous constituency parsing with mildly context-sensitive grammars

S Yang, RP Levy, Y Kim - arXiv preprint arXiv:2212.09140, 2022 - arxiv.org
We study grammar induction with mildly context-sensitive grammars for unsupervised
discontinuous parsing. Using the probabilistic linear context-free rewriting system (LCFRS) …