SeamlessM4T-Massively Multilingual & Multimodal Machine Translation

L Barrault, YA Chung, MC Meglioli, D Dale… - arXiv preprint arXiv …, 2023 - arxiv.org
What does it take to create the Babel Fish, a tool that can help individuals translate speech
between any two languages? While recent breakthroughs in text-based models have …

Augmented datasheets for speech datasets and ethical decision-making

O Papakyriakopoulos, ASG Choi, W Thong… - Proceedings of the …, 2023 - dl.acm.org
Speech datasets are crucial for training Speech Language Technologies (SLT); however,
the lack of diversity of the underlying training data can lead to serious limitations in building …

MISGENDERED: Limits of large language models in understanding pronouns

T Hossain, S Dev, S Singh - arXiv preprint arXiv:2306.03950, 2023 - arxiv.org
Content Warning: This paper contains examples of misgendering and erasure that could be
offensive and potentially triggering. Gender bias in language technologies has been widely …

Out-of-distribution generalization in natural language processing: Past, present, and future

L Yang, Y Song, X Ren, C Lyu, Y Wang… - Proceedings of the …, 2023 - aclanthology.org
Abstract Machine learning (ML) systems in natural language processing (NLP) face
significant challenges in generalizing to out-of-distribution (OOD) data, where the test …

Hi guys or hi folks? benchmarking gender-neutral machine translation with the gente corpus

A Piergentili, B Savoldi, D Fucci, M Negri… - arXiv preprint arXiv …, 2023 - arxiv.org
Gender inequality is embedded in our communication practices and perpetuated in
translation technologies. This becomes particularly apparent when translating into …

Exploiting biased models to de-bias text: A gender-fair rewriting model

C Amrhein, F Schottmann, R Sennrich… - arXiv preprint arXiv …, 2023 - arxiv.org
Natural language generation models reproduce and often amplify the biases present in their
training data. Previous research explored using sequence-to-sequence rewriting models to …

MT-GenEval: A counterfactual and contextual dataset for evaluating gender accuracy in machine translation

A Currey, M Nădejde, R Pappagari, M Mayer… - arXiv preprint arXiv …, 2022 - arxiv.org
As generic machine translation (MT) quality has improved, the need for targeted
benchmarks that explore fine-grained aspects of quality has increased. In particular, gender …

Contrastive conditioning for assessing disambiguation in MT: A case study of distilled bias

J Vamvas, R Sennrich - 2021 Conference on Empirical Methods …, 2021 - research.ed.ac.uk
Lexical disambiguation is a major challenge for machine translation systems, especially if
some senses of a word are trained less often than others. Identifying patterns of …

Test suites task: Evaluation of gender fairness in MT with MuST-SHE and INES

B Savoldi, M Gaido, M Negri, L Bentivogli - arXiv preprint arXiv …, 2023 - arxiv.org
As part of the WMT-2023" Test suites" shared task, in this paper we summarize the results of
two test suites evaluations: MuST-SHE-WMT23 and INES. By focusing on the en-de and de …

A prompt response to the demand for automatic gender-neutral translation

B Savoldi, A Piergentili, D Fucci, M Negri… - arXiv preprint arXiv …, 2024 - arxiv.org
Gender-neutral translation (GNT) that avoids biased and undue binary assumptions is a
pivotal challenge for the creation of more inclusive translation technologies. Advancements …