The Zeno's Paradox ofLow-Resource'Languages

HH Nigatu, AL Tonja, B Rosman, T Solorio… - arXiv preprint arXiv …, 2024 - arxiv.org
The disparity in the languages commonly studied in Natural Language Processing (NLP) is
typically reflected by referring to languages as low vs high-resourced. However, there is …

DM-BLI: Dynamic multiple subspaces alignment for unsupervised bilingual lexicon induction

L Hu, Y Xu - Proceedings of the 62nd Annual Meeting of the …, 2024 - aclanthology.org
Unsupervised bilingual lexicon induction (BLI) task aims to find word translations between
languages and has achieved great success in similar language pairs. However, related …

Isovec: Controlling the relative isomorphism of word embedding spaces

K Marchisio, N Verma, K Duh, P Koehn - arXiv preprint arXiv:2210.05098, 2022 - arxiv.org
The ability to extract high-quality translation dictionaries from monolingual word embedding
spaces depends critically on the geometric similarity of the spaces--their degree of" …

Graph-based multilingual label propagation for low-resource part-of-speech tagging

A Imani, S Severini, MJ Sabet, F Yvon… - arXiv preprint arXiv …, 2022 - arxiv.org
Part-of-Speech (POS) tagging is an important component of the NLP pipeline, but many low-
resource languages lack labeled data for training. An established method for training a POS …

Do not neglect related languages: The case of low-resource Occitan cross-lingual word embeddings

L Woller, V Hangya, A Fraser - … of the 1st Workshop on Multilingual …, 2021 - aclanthology.org
Cross-lingual word embeddings (CLWEs) have proven indispensable for various natural
language processing tasks, eg, bilingual lexicon induction (BLI). However, the lack of data …

How to encode arbitrarily complex morphology in word embeddings, no corpus needed

L Schwartz, C Haley, F Tyers - … of the first workshop on NLP …, 2022 - aclanthology.org
In this paper, we present a straightforward technique for constructing interpretable word
embeddings from morphologically analyzed examples (such as interlinear glosses) for all of …

Improving translation of out of vocabulary words using bilingual lexicon induction in low-resource machine translation

J Waldendorf, A Birch, B Haddow… - … Biennial Conference of …, 2022 - research.ed.ac.uk
Dictionary-based data augmentation techniques have been used in the field of domain
adaptation to learn words that do not appear in the parallel training data of a machine …

Multilingual Word Embeddings for Low-Resource Languages using Anchors and a Chain of Related Languages

V Hangya, S Severini, R Ralev, A Fraser… - arXiv preprint arXiv …, 2023 - arxiv.org
Very low-resource languages, having only a few million tokens worth of data, are not well-
supported by multilingual NLP approaches due to poor quality cross-lingual word …

A survey of neural-network-based methods utilising comparable data for finding translation equivalents

M Denisová, P Rychlý - arXiv preprint arXiv:2410.15144, 2024 - arxiv.org
The importance of inducing bilingual dictionary components in many natural language
processing (NLP) applications is indisputable. However, the dictionary compilation process …

Multilinguality from Static Embedding Spaces: Algorithmic, Geometric, and Data Considerations

KV Marchisio - 2023 - jscholarship.library.jhu.edu
To date, most work towards developing natural language processing (NLP) technologies
has focused on the English language. At the same time, there are an estimated 7000+ living …