A supervised word alignment method based on cross-language span prediction using multilingual BERT

M Nagata, C Katsuki, M Nishino - arXiv preprint arXiv:2004.14516, 2020 - arxiv.org
We present a novel supervised word alignment method based on cross-language span
prediction. We first formalize a word alignment problem as a collection of independent …

WSPAlign: Word alignment pre-training via large-scale weakly supervised span prediction

Q Wu, M Nagata, Y Tsuruoka - arXiv preprint arXiv:2306.05644, 2023 - arxiv.org
Most existing word alignment methods rely on manual alignment datasets or parallel
corpora, which limits their usefulness. Here, to mitigate the dependence on manual data, we …

Evaluating automatic sentence alignment approaches on English-Slovak sentences

F Forgac, D Munkova, M Munk, L Kelebercova - Scientific Reports, 2023 - nature.com
Parallel texts represent a very valuable resource in many applications of natural language
processing. The fundamental step in creating parallel corpus is the alignment. Sentence …

Domain adaptation of machine translation with crowdworkers

M Morishita, J Suzuki, M Nagata - arXiv preprint arXiv:2210.15861, 2022 - arxiv.org
Although a machine translation model trained with a large in-domain parallel corpus
achieves remarkable results, it still works poorly when no in-domain data are available. This …

KC4MT: A high-quality corpus for multilingual machine translation

V Van Nguyen, H Nguyen, HT Le… - Proceedings of the …, 2022 - aclanthology.org
The multilingual parallel corpus is an important resource for many applications of natural
language processing (NLP). For machine translation, the size and quality of the training …

Word Alignment as Preference for Machine Translation

Q Wu, M Nagata, Z Miao, Y Tsuruoka - arXiv preprint arXiv:2405.09223, 2024 - arxiv.org
The problem of hallucination and omission, a long-standing problem in machine translation
(MT), is more pronounced when a large language model (LLM) is used in MT because an …

The effect of alignment correction on cross-lingual annotation projection

S Behzad, S Ebner, M Marone… - Proceedings of the …, 2023 - aclanthology.org
Cross-lingual annotation projection is a practical method for improving performance on low
resource structured prediction tasks. An important step in annotation projection is obtaining …

Word Sense Disambiguation for Ancient Greek: Sourcing a training corpus through translation alignment

A Keersmaekers, W Mercelis… - Proceedings of the …, 2023 - aclanthology.org
This paper seeks to leverage translations of Ancient Greek texts to enhance the performance
of automatic word sense disambiguation (WSD). Satisfactory WSD in Ancient Greek is …

[PDF][PDF] KC4Align: Improving sentence alignment method for low-resource language pairs

HN Tien, DN Huu, H Le Thanh, VN Van… - Proceedings of the …, 2021 - aclanthology.org
Bilingual corpus is an important resource for many applications of natural language
processing (NLP). About low-resource language pairs, it is more necessary to build them …

Derin Oğrenme kullanılarak Ingilizce–Türkçe Ceviriler Için Cümle Eşleme Sistemi

E Kızılırmak - 2023 - search.proquest.com
Doğal dil işleme, yapay zekânın ve dil biliminin gelişimiyle son yıllarda önem kazandı. Çeviri
şirketlerinin yapmış olduğu,“Türkçe'den İngilizce'ye-İngilizce'den Türkçe'ye” birebir çevirileri …