Findings of the AmericasNLP 2021 shared task on open machine translation for indigenous languages of the Americas

M Mager, A Oncevay, A Ebrahimi, J Ortega… - Proceedings of the …, 2021 - aclanthology.org
This paper presents the results of the 2021 Shared Task on Open Machine Translation for
Indigenous Languages of the Americas. The shared task featured two independent tracks …

When is multilinguality a curse? language modeling for 250 high-and low-resource languages

TA Chang, C Arnett, Z Tu, BK Bergen - arXiv preprint arXiv:2311.09205, 2023 - arxiv.org
Multilingual language models are widely used to extend NLP systems to low-resource
languages. However, concrete evidence for the effects of multilinguality on language …

Transcending language boundaries: Harnessing llms for low-resource language translation

P Shu, J Chen, Z Liu, H Wang, Z Wu, T Zhong… - arXiv preprint arXiv …, 2024 - arxiv.org
Large Language Models (LLMs) have demonstrated remarkable success across a wide
range of tasks and domains. However, their performance in low-resource language …

Goldfish: Monolingual language models for 350 languages

TA Chang, C Arnett, Z Tu, BK Bergen - arXiv preprint arXiv:2408.10441, 2024 - arxiv.org
For many low-resource languages, the only available language models are large
multilingual models trained on many languages simultaneously. However, using FLORES …

Apnn-tc: Accelerating arbitrary precision neural networks on ampere gpu tensor cores

B Feng, Y Wang, T Geng, A Li, Y Ding - Proceedings of the international …, 2021 - dl.acm.org
Over the years, accelerating neural networks with quantization has been widely studied.
Unfortunately, prior efforts with diverse precisions (eg, 1-bit weights and 2-bit activations) are …

Not always about you: Prioritizing community needs when developing endangered language technology

Z Liu, C Richardson, R Hatcher Jr… - arXiv preprint arXiv …, 2022 - arxiv.org
Languages are classified as low-resource when they lack the quantity of data necessary for
training statistical and machine learning tools and models. Causes of resource scarcity vary …

Hierarchical phrase-based sequence-to-sequence learning

B Wang, I Titov, J Andreas, Y Kim - arXiv preprint arXiv:2211.07906, 2022 - arxiv.org
We describe a neural transducer that maintains the flexibility of standard sequence-to-
sequence (seq2seq) models while incorporating hierarchical phrases as a source of …

Neural machine translation for the indigenous languages of the Americas: An introduction

M Mager, R Bhatnagar, G Neubig, NT Vu… - arXiv preprint arXiv …, 2023 - arxiv.org
Neural models have drastically advanced state of the art for machine translation (MT)
between high-resource languages. Traditionally, these models rely on large amounts of …

Compare encoder-decoder, encoder-only, and decoder-only architectures for text generation on low-resource datasets

PX Cai, YC Fan, FY Leu - Advances on Broad-Band Wireless Computing …, 2022 - Springer
Natural language generation (NLG) tasks have received significant research attention in
recent years. For tackling various NLG tasks, the Transformer [27] is now consensus to be …

BLEU Skies for Endangered Language Revitalization: Lemko Rusyn and Ukrainian Neural AI Translation Accuracy Soars

P Orynycz - International Conference on Human-Computer …, 2023 - Springer
Accelerating global language loss, associated with elevated incidence of illicit substance
use, type 2 diabetes, binge drinking, and assault, as well as sixfold higher youth suicide …