相关文章- 学术资源搜索

JParaCrawl: A large scale web-based English-Japanese parallel corpus

M Morishita, J Suzuki, M Nagata - arXiv preprint arXiv:1911.10668, 2019 - arxiv.org

Recent machine translation algorithms mainly rely on parallel corpora. However, since the
availability of parallel corpora remains limited, only some resource-rich language pairs can …

被引用次数：66 相关文章所有 4 个版本

[PDF] arxiv.org

JParaCrawl v3. 0: A large-scale English-Japanese parallel corpus

M Morishita, K Chousa, J Suzuki, M Nagata - arXiv preprint arXiv …, 2022 - arxiv.org

Most current machine translation models are mainly trained with parallel corpora, and their
translation accuracy largely depends on the quality and quantity of the corpora. Although …

被引用次数：29 相关文章所有 7 个版本

[PDF] openreview.net

Unsupervised machine translation using monolingual corpora only

G Lample, A Conneau, L Denoyer… - arXiv preprint arXiv …, 2017 - arxiv.org

Machine translation has recently achieved impressive performance thanks to recent
advances in deep learning and the availability of large-scale parallel corpora. There have …

被引用次数：1309 相关文章所有 6 个版本

A large English–Thai parallel corpus from the web and machine-generated text

L Lowphansirikul, C Polpanumas… - Language Resources …, 2022 - Springer

The primary objective of our work is to build a large-scale English–Thai dataset for training
neural machine translation models. We construct scb-mt-en-th-2020, an English–Thai …

被引用次数：29 相关文章所有 4 个版本

[PDF] arxiv.org

Parallel corpus filtering via pre-trained language models

B Zhang, A Nagesh, K Knight - arXiv preprint arXiv:2005.06166, 2020 - arxiv.org

Web-crawled data provides a good source of parallel corpora for training machine
translation models. It is automatically obtained, but extremely noisy, and recent work shows …

被引用次数：32 相关文章所有 3 个版本

[PDF] ieee.org

A data augmentation method for English-Vietnamese neural machine translation

NL Pham, TV Pham - IEEE Access, 2023 - ieeexplore.ieee.org

The translation quality of machine translation systems depends on the parallel corpus used
for training, particularly on the quantity and quality of the corpus. However, building a high …

被引用次数：13 相关文章所有 2 个版本

[PDF] aclanthology.org

Phrase-based & neural unsupervised machine translation

G Lample, M Ott, A Conneau, L Denoyer… - arXiv preprint arXiv …, 2018 - arxiv.org

Machine translation systems achieve near human-level performance on some languages,
yet their effectiveness strongly relies on the availability of large amounts of parallel …

被引用次数：803 相关文章所有 6 个版本

[PDF] ed.ac.uk

Document-level machine translation with large-scale public parallel corpora

P Pal, A Birch-Mayne, K Heafield - The 62nd Annual Meeting of …, 2024 - research.ed.ac.uk

Despite the fact that document-level machine translation has inherent advantages over
sentence-level machine translation due to additional information available to a model from …

被引用次数：2 相关文章所有 5 个版本

[PDF] aclanthology.org

Improving low-resource neural machine translation with filtered pseudo-parallel corpus

A Imankulova, T Sato, M Komachi - … of the 4th Workshop on Asian …, 2017 - aclanthology.org

Large-scale parallel corpora are indispensable to train highly accurate machine translators.
However, manually constructed large-scale parallel corpora are not freely available in many …

被引用次数：61 相关文章所有 3 个版本

[PDF] arxiv.org

Extract and edit: An alternative to back-translation for unsupervised neural machine translation

J Wu, X Wang, WY Wang - arXiv preprint arXiv:1904.02331, 2019 - arxiv.org

The overreliance on large parallel corpora significantly limits the applicability of machine
translation systems to the majority of language pairs. Back-translation has been dominantly …

被引用次数：50 相关文章所有 3 个版本

高级搜索

QQ 群

JParaCrawl: A large scale web-based English-Japanese parallel corpus

JParaCrawl v3. 0: A large-scale English-Japanese parallel corpus

Unsupervised machine translation using monolingual corpora only

A large English–Thai parallel corpus from the web and machine-generated text

Parallel corpus filtering via pre-trained language models

A data augmentation method for English-Vietnamese neural machine translation

Phrase-based & neural unsupervised machine translation

Document-level machine translation with large-scale public parallel corpora

Improving low-resource neural machine translation with filtered pseudo-parallel corpus

Extract and edit: An alternative to back-translation for unsupervised neural machine translation

相关搜索

引用