Participatory research for low-resourced machine translation: A case study in african languages

W Nekoto, V Marivate, T Matsila, T Fasubaa… - arXiv preprint arXiv …, 2020 - arxiv.org
Research in NLP lacks geographic diversity, and the question of how NLP can be scaled to
low-resourced languages has not yet been adequately solved." Low-resourced"-ness is a …

Spoken Spanish PoS tagging: gold standard dataset

JE Bonilla - Language Resources and Evaluation, 2024 - Springer
The development of a benchmark for part-of-speech (PoS) tagging of spoken dialectal
European Spanish is presented, which will serve as the foundation for a future treebank. The …

Text corpora and the challenge of newly written languages

A Millour, K Fort - Proceedings of the 1st Joint Workshop on …, 2020 - aclanthology.org
Text corpora represent the foundation on which most natural language processing systems
rely. However, for many languages, collecting or building a text corpus of a sufficient size still …

Unsupervised data augmentation for less-resourced languages with no standardized spelling

A Millour, K Fort - Proceedings of the International Conference on …, 2019 - aclanthology.org
Building representative linguistic resources and NLP tools for non-standardized languages
is challenging: when spelling is not determined by a norm, multiple written forms can be …

Katana and Grand Guru: a Game of the Lost Words

A Millour, MG Araneta, IL Konjik, A Raffone… - 9th Language & …, 2019 - hal.science
We present here a prototype of a role playing game which allows to both i) crowdsource
lexical units (including idioms) for a language and ii) help the player improve their …

À l'écoute des locuteurs: production participative de ressources langagières pour des langues non standardisées

A Millour, K Fort - Revue TAL: traitement automatique des langues, 2018 - hal.science
Les sciences participatives, et en particulier la production participative (crowdsourcing)
bénévole, sont un moyen encore peu exploité de créer des ressources langagières pour les …

Juegos con propósito para la anotación del Corpus Oral Sonoro del Español rural

RLS Díaz, JE Bonilla, M Bouzouita… - Dialectologia et …, 2023 - degruyter.com
The study of dialectal microvariation in spoken Spanish faces challenges due to the
absence of an adequate morpho-syntactically annotated and parsed corpus. Therefore, this …

Krik: First steps into crowdsourcing pos tags for kréyòl gwadloupéyen

A Millour, K Fort - CCURL 2018, 2018 - hal.science
This article presents the adaptation to Guadeloupean Creole of a project of crowdsourcing
part-of-speech (POS) tags initially designed for a French regional language, Alsatian. We do …

Using GWAPs for Verifying PoS Tagging of Spoken Dialectal Spanish

JE Bonilla, RLS Diaz… - 2023 10th International …, 2023 - ieeexplore.ieee.org
Given the scarcity of linguistic resources available for spoken varieties, this paper explores
the use of gamified approaches for verifying Part of Speech (PoS) tagging of spoken …

Getting to Know the Speakers: a Survey of a Non-Standardized Language Digital Use

A Millour - 9th Language & Technology Conference: Human …, 2019 - hal.science
This paper presents the results of an on-line survey regarding the use on the Internet of a
less-resourced non-standardized language: Al-satian. The survey, entitled “Alsatian, the …