Prompting large language model for machine translation: A case study

B Zhang, B Haddow, A Birch - International Conference on …, 2023 - proceedings.mlr.press
Research on prompting has shown excellent performance with little or even no supervised
training across many tasks. However, prompting for machine translation is still under …

The unreasonable effectiveness of few-shot learning for machine translation

X Garcia, Y Bansal, C Cherry, G Foster… - International …, 2023 - proceedings.mlr.press
We demonstrate the potential of few-shot translation systems, trained with unpaired
language data, for both high and low-resource language pairs. We show that with only 5 …

Findings of the 2021 conference on machine translation (WMT21)

A Farhad, A Arkady, B Magdalena, B Ondřej… - Proceedings of the …, 2021 - cris.fbk.eu
This paper presents the results of the news translation task, the multilingual low-resource
translation for Indo-European languages, the triangular translation task, and the automatic …

Dinoiser: Diffused conditional sequence learning by manipulating noises

J Ye, Z Zheng, Y Bao, L Qian, M Wang - arXiv preprint arXiv:2302.10025, 2023 - arxiv.org
While diffusion models have achieved great success in generating continuous signals such
as images and audio, it remains elusive for diffusion models in learning discrete sequence …

On the learning of non-autoregressive transformers

F Huang, T Tao, H Zhou, L Li… - … Conference on Machine …, 2022 - proceedings.mlr.press
Non-autoregressive Transformer (NAT) is a family of text generation models, which aims to
reduce the decoding latency by predicting the whole sentences in parallel. However, such …

Selective knowledge distillation for non-autoregressive neural machine translation

M Liu, Y Bao, C Zhao, S Huang - … of the AAAI Conference on Artificial …, 2023 - ojs.aaai.org
Benefiting from the sequence-level knowledge distillation, the Non-Autoregressive
Transformer (NAT) achieves great success in neural machine translation tasks. However …

Diffusion language models can perform many tasks with scaling and instruction-finetuning

J Ye, Z Zheng, Y Bao, L Qian, Q Gu - arXiv preprint arXiv:2308.12219, 2023 - arxiv.org
The recent surge of generative AI has been fueled by the generative power of diffusion
probabilistic models and the scalable capabilities of large language models. Despite their …

switch-GLAT: Multilingual parallel machine translation via code-switch decoder

Z Song, H Zhou, L Qian, J Xu, S Cheng… - International …, 2022 - openreview.net
Multilingual machine translation aims to develop a single model for multiple language
directions. However, existing multilingual models based on Transformer are limited in terms …

Fuzzy alignments in directed acyclic graph for non-autoregressive machine translation

Z Ma, C Shao, S Gui, M Zhang, Y Feng - arXiv preprint arXiv:2303.06662, 2023 - arxiv.org
Non-autoregressive translation (NAT) reduces the decoding latency but suffers from
performance degradation due to the multi-modality problem. Recently, the structure of …

Diffusion Language Models Are Versatile Protein Learners

X Wang, Z Zheng, F Ye, D Xue, S Huang… - arXiv preprint arXiv …, 2024 - arxiv.org
This paper introduces diffusion protein language model (DPLM), a versatile protein
language model that demonstrates strong generative and predictive capabilities for protein …