Survey of data-selection methods in statistical machine translation

R Aharoni, Y Goldberg - arXiv preprint arXiv:2004.02105, 2020 - arxiv.org

The notion of" in-domain data" in NLP is often over-simplistic and vague, as textual data
varies in many nuanced linguistic aspects such as topic, style or level of formality. In …

被引用次数：248 相关文章所有 5 个版本

[PDF] arxiv.org

Dynamic data selection for neural machine translation

M Van Der Wees, A Bisazza, C Monz - arXiv preprint arXiv:1708.00712, 2017 - arxiv.org

Intelligent selection of training data has proven a successful technique to simultaneously
increase training efficiency and translation performance for phrase-based machine …

被引用次数：182 相关文章所有 8 个版本

[PDF] springer.com

Incorporating Collaborative and Active Learning Strategies in the Design and Deployment of a Master Course on Computer-Assisted Scientific Translation

M Zappatore - Technology, Knowledge and Learning, 2024 - Springer

This research aims to address the current gaps in computer-assisted translation (CAT)
courses offered in bachelor's and master's programmes in scientific and technical translation …

被引用次数：8 相关文章所有 5 个版本

[PDF] dcu.ie

Extracting in-domain training corpora for neural machine translation using data selection methods

C Cruz Silva, CH Liu, A Poncelas, A Way - 2018 - doras.dcu.ie

Data selection is a process used in selecting a subset of parallel data for the training of
machine translation (MT) systems, so that 1) resources for training might be reduced, 2) …

被引用次数：34 相关文章所有 6 个版本

[PDF] aclanthology.org

[PDF][PDF] Translation quality and productivity: A study on rich morphology languages

L Specia, K Harris, F Blain, A Burchardt… - … XVI: Research Track, 2017 - aclanthology.org

This paper introduces a unique large-scale machine translation dataset with various levels
of human annotation combined with automatically recorded productivity features such as …

被引用次数：38 相关文章所有 9 个版本

[PDF] arxiv.org

Automatic document selection for efficient encoder pretraining

Y Feng, P Xia, B Van Durme, J Sedoc - arXiv preprint arXiv:2210.10951, 2022 - arxiv.org

Building pretrained language models is considered expensive and data-intensive, but must
we increase dataset size to achieve better performance? We propose an alternative to larger …

被引用次数：7 相关文章所有 3 个版本

Active learning for neural machine translation

P Zhang, X Xu, D Xiong - 2018 International Conference on …, 2018 - ieeexplore.ieee.org

Neural machine translation (NMT) normally requires a large bilingual corpus to train a high-
translation-quality model. However, building such parallel corpora for many low-resource …

被引用次数：24 相关文章

[PDF] arxiv.org

Separating grains from the chaff: Using data filtering to improve multilingual translation for low-resourced African languages

I Abdulmumin, M Beukman, JO Alabi, C Emezue… - arXiv preprint arXiv …, 2022 - arxiv.org

We participated in the WMT 2022 Large-Scale Machine Translation Evaluation for the
African Languages Shared Task. This work describes our approach, which is based on …

被引用次数：4 相关文章所有 12 个版本

[PDF] arxiv.org

Adaptive Modeling of Uncertainties for Traffic Forecasting

Y Wu, Y Ye, A Zeb, JJ Yu… - IEEE Transactions on …, 2023 - ieeexplore.ieee.org

Deep neural networks (DNNs) have emerged as a dominant approach for developing traffic
forecasting models. These models are typically trained to minimize error on averaged test …

被引用次数：4 相关文章所有 5 个版本

[PDF] ua.es

Feature decay algorithms for neural machine translation

A Poncelas, G Maillette de Buy Wenniger, A Way - 2018 - rua.ua.es

Neural Machine Translation (NMT) systems require a lot of data to be competitive. For this
reason, data selection techniques are used only for fine-tuning systems that have been …

被引用次数：21 相关文章所有 9 个版本

高级搜索

QQ 群