Quality at a glance: An audit of web-crawled multilingual datasets J Kreutzer, I Caswell, L Wang, A Wahab, D van Esch, N Ulzii-Orshikh, ... Transactions of the Association for Computational Linguistics 10, 50-72, 2022 | 98 | 2022 |
Language ID in the Wild: Unexpected Challenges on the Path to a Thousand-Language Web Text Corpus I Caswell, T Breiner, D van Esch, A Bapna arXiv preprint arXiv:2010.14571, 2020 | 73 | 2020 |
Building Speech Recognition Systems for Language Documentation: The CoEDL Endangered Language Pipeline and Inference System B Foley, J Arnold, R Coto-Solano, G Durantin, TM Ellison, D van Esch, ... Proceedings of the 6th International Workshop on Spoken Language …, 2018 | 71 | 2018 |
Building Machine Translation Systems for the Next Thousand Languages A Bapna, I Caswell, J Kreutzer, O Firat, D van Esch, A Siddhant, M Niu, ... arXiv preprint arXiv:2205.03983, 2022 | 61 | 2022 |
How Might We Create Better Benchmarks for Speech Recognition? A Aksënova, D van Esch, J Flynn, P Golik Proceedings of the 1st Workshop on Benchmarking: Past, Present and Future, 22-34, 2021 | 33 | 2021 |
Writing Across the World's Languages: Deep Internationalization for Gboard, the Google Keyboard D van Esch, E Sarbar, T Lucassen, J O'Brien, T Breiner, M Prasad, E Crew, ... arXiv preprint arXiv:1912.01218, 2019 | 21 | 2019 |
Writing system and speaker metadata for 2,800+ language varieties D van Esch, T Lucassen, S Ruder, I Caswell, C Rivera Proceedings of the Thirteenth Language Resources and Evaluation Conference …, 2022 | 20 | 2022 |
Future directions in technological support for language documentation D van Esch, B Foley, N San Proceedings of the Workshop on Computational Methods for Endangered Languages 1, 2019 | 20 | 2019 |
An Expanded Taxonomy of Semiotic Classes for Text Normalization D van Esch, R Sproat Proceedings of Interspeech 2017, 4016-4020, 2017 | 20 | 2017 |
Leiden Weibo Corpus D van Esch | 20 | 2012 |
Accented Speech Recognition: Benchmarking, Pre-training, and Diverse Data A Aksënova, Z Chen, CC Chiu, D van Esch, P Golik, W Han, L King, ... arXiv preprint arXiv:2205.08014, 2022 | 18 | 2022 |
Building Large-Vocabulary ASR Systems for Languages Without Any Audio Training Data M Prasad, D van Esch, S Ritchie, JF Mortensen Proc. Interspeech 2019, 271-275, 2019 | 18 | 2019 |
Text Normalization Infrastructure that Scales to Hundreds of Language Varieties M Chua, D van Esch, N Coccaro, E Cho, S Bhandari, L Jia Proceedings of the 11th edition of the Language Resources and Evaluation …, 2018 | 18 | 2018 |
Xtreme-s: Evaluating cross-lingual speech representations A Conneau, A Bapna, Y Zhang, M Ma, P von Platen, A Lozhkov, C Cherry, ... arXiv preprint arXiv:2203.10752, 2022 | 16 | 2022 |
Predicting Pronunciations with Syllabification and Stress with Recurrent Neural Networks. D van Esch, M Chua, K Rao Proceedings of Interspeech 2016, 2841-2845, 2016 | 16 | 2016 |
Mining Training Data for Language Modeling across the World’s Languages M Prasad, T Breiner, D van Esch Proceedings of the 6th International Workshop on Spoken Language …, 2018 | 12 | 2018 |
Unified Verbalization for Speech Recognition & Synthesis Across Languages S Ritchie, R Sproat, K Gorman, D van Esch, C Schallhart, N Bampounis, ... Proc. Interspeech 2019, 3530-3534, 2019 | 10 | 2019 |
Developing Pronunciation Models in New Languages Faster by Exploiting Common Grapheme-to-Phoneme Correspondences Across Languages H Bleyan, S Ritchie, JF Mortensen, D van Esch Proc. Interspeech 2019, 2100-2104, 2019 | 7 | 2019 |
Large vocabulary speech recognition for languages of Africa: multilingual modeling and self-supervised learning S Ritchie, YC Cheng, M Chen, R Mathews, D van Esch, B Li, KC Sim arXiv preprint arXiv:2208.03067, 2022 | 6 | 2022 |
Data-Driven Parametric Text Normalization: Rapidly Scaling Finite-State Transduction Verbalizers to New Languages S Ritchie, E Mahon, K Heiligenstein, N Bampounis, D van Esch, ... Proceedings of the 1st Joint Workshop on Spoken Language Technologies for …, 2020 | 5 | 2020 |