Low-resource languages: A review of past work and future challenges

A Magueresse, V Carles, E Heetderks - arXiv preprint arXiv:2006.07264, 2020 - arxiv.org
A current problem in NLP is massaging and processing low-resource languages which lack
useful training attributes such as supervised data, number of native speakers or experts, etc …

[PDF][PDF] Findings of the wmt 2016 bilingual document alignment shared task

C Buck, P Koehn - Proceedings of the First Conference on …, 2016 - aclanthology.org
Findings of the WMT 2016 Bilingual Document Alignment Shared Task Page 1 Proceedings of
the First Conference on Machine Translation, Volume 2: Shared Task Papers, pages 554–563 …

Exploiting bilingual lexicons to improve multilingual embedding-based document and sentence alignment for low-resource languages

A Fernando, S Ranathunga, D Sachintha… - … and Information Systems, 2023 - Springer
Neural machine translation systems trained on low-resource languages produce sub-
optimal results due to the scarcity of large parallel datasets. To alleviate this problem …

Massively Multilingual Document Alignment with Cross-lingual Sentence-Mover's Distance

A El-Kishky, F Guzmán - arXiv preprint arXiv:2002.00761, 2020 - arxiv.org
Document alignment aims to identify pairs of documents in two distinct languages that are of
comparable content or translations of each other. Such aligned data can be used for a …

Efficient document alignment across scenarios

A Azpeitia, T Etchegoyhen - Machine Translation, 2019 - Springer
We present and evaluate an approach to document alignment meant for efficiency and
portability, as it relies on automatically extracted lexical translations and simple set-theoretic …

Extraction of Parallel Sentences

S Sharoff, R Rapp, P Zweigenbaum - Building and Using Comparable …, 2023 - Springer
As explained in Chap. 1 and later developed in Chap. 6, Machine Translation (MT) engines
need to be trained with large numbers of parallel sentences or segments. The quantity and …

Other Applications of Comparable Corpora

S Sharoff, R Rapp, P Zweigenbaum - Building and Using Comparable …, 2023 - Springer
This section concerns applications of comparable corpora beyond pure machine translation.
It has been argued [,] that downstream applications such as cross-lingual document …

Building Comparable Corpora

S Sharoff, R Rapp, P Zweigenbaum - Building and Using Comparable …, 2023 - Springer
In a parallel corpus we know which document is a translation of what by design. If the link
between documents in different languages is not known, it needs to be established. In this …

Detecting Fine-Grained Semantic Divergences to Improve Translation Understanding Across Languages

E Briakou - 2023 - search.proquest.com
One of the core goals of Natural Language Processing (NLP) is to develop computational
representations and methods to compare and contrast text meaning across languages. Such …

Text mining at multiple granularity: leveraging subwords, words, phrases, and sentences

AH El-Kishky - 2020 - ideals.illinois.edu
With the rapid digitization of information, large quantities of text-heavy data is being
constantly generated in many languages and across domains such as web documents …