Deterministic Reversible Data Augmentation for Neural Machine Translation

J Yao, H Huang, Z Liu, Y Guo - arXiv preprint arXiv:2406.02517, 2024 - arxiv.org
Data augmentation is an effective way to diversify corpora in machine translation, but
previous methods may introduce semantic inconsistency between original and augmented …

DiverSeg: Leveraging Diverse Segmentations with Cross-granularity Alignment for Neural Machine Translation

H Song, Z Mao, R Dabre, C Chu… - Journal of Natural …, 2024 - jstage.jst.go.jp
In this study, we proposed DiverSeg to exploit diverse segmentations from multiple subword
segmenters that capture the various perspectives of each word for neural machine …

SubMerge: Merging Equivalent Subword Tokenizations for Subword Regularized Models in Neural Machine Translation

H Song, F Meyer, R Dabre, H Tanaka, C Chu… - 2024 - pubs.cs.uct.ac.za
Subword regularized models leverage multiple subword tokenizations of one target
sentence during training. Previous decoding algorithms select one tokenization during …

[PDF][PDF] Studies on Subword-based Low-Resource Neural Machine Translation: Segmentation, Encoding, and Decoding

S Haiyue - 2024 - repository.kulib.kyoto-u.ac.jp
In a world rich with diverse ideas and cultures, humans are isolated into islands of distinct
languages. Machine translation (MT) serves as a bridge, facilitating information access and …