Most current machine translation models are mainly trained with parallel corpora, and their translation accuracy largely depends on the quality and quantity of the corpora. Although …
Machine translation has recently achieved impressive performance thanks to recent advances in deep learning and the availability of large-scale parallel corpora. There have …
L Lowphansirikul, C Polpanumas… - Language Resources …, 2022 - Springer
The primary objective of our work is to build a large-scale English–Thai dataset for training neural machine translation models. We construct scb-mt-en-th-2020, an English–Thai …
Web-crawled data provides a good source of parallel corpora for training machine translation models. It is automatically obtained, but extremely noisy, and recent work shows …
NL Pham, TV Pham - IEEE Access, 2023 - ieeexplore.ieee.org
The translation quality of machine translation systems depends on the parallel corpus used for training, particularly on the quantity and quality of the corpus. However, building a high …
Machine translation systems achieve near human-level performance on some languages, yet their effectiveness strongly relies on the availability of large amounts of parallel …
Despite the fact that document-level machine translation has inherent advantages over sentence-level machine translation due to additional information available to a model from …
A Imankulova, T Sato, M Komachi - … of the 4th Workshop on Asian …, 2017 - aclanthology.org
Large-scale parallel corpora are indispensable to train highly accurate machine translators. However, manually constructed large-scale parallel corpora are not freely available in many …
J Wu, X Wang, WY Wang - arXiv preprint arXiv:1904.02331, 2019 - arxiv.org
The overreliance on large parallel corpora significantly limits the applicability of machine translation systems to the majority of language pairs. Back-translation has been dominantly …