GLUECoS: An evaluation benchmark for code-switched NLP

S Khanuja, S Dandapat, A Srinivasan… - arXiv preprint arXiv …, 2020 - arxiv.org
Code-switching is the use of more than one language in the same conversation or utterance.
Recently, multilingual contextual embedding models, trained on multiple monolingual …

A survey of code-switched speech and language processing

S Sitaram, KR Chandu, SK Rallabandi… - arXiv preprint arXiv …, 2019 - arxiv.org
Code-switching, the alternation of languages within a conversation or utterance, is a
common communicative phenomenon that occurs in multilingual communities across the …

ViNLI: A Vietnamese corpus for studies on open-domain natural language inference

T Van Huynh, K Van Nguyen… - Proceedings of the 29th …, 2022 - aclanthology.org
Over a decade, the research field of computational linguistics has witnessed the growth of
corpora and models for natural language inference (NLI) for rich-resource languages such …

Jampatoisnli: A jamaican patois natural language inference dataset

RA Armstrong, J Hewitt, C Manning - arXiv preprint arXiv:2212.03419, 2022 - arxiv.org
JamPatoisNLI provides the first dataset for natural language inference in a creole language,
Jamaican Patois. Many of the most-spoken low-resource languages are creoles. These …

Knowing What to Say: Towards knowledge grounded code-mixed response generation for open-domain conversations

GV Singh, M Firdaus, S Mishra, A Ekbal - Knowledge-Based Systems, 2022 - Elsevier
Inculcating knowledge in the dialogue agents is an important step towards creating any
agent more human-like. Hence, the use of knowledge while conversing is crucial for building …

Exploring methods for building dialects-Mandarin code-mixing corpora: A case study in Taiwanese Hokkien

SE Lu, BH Lu, CY Lu, RTH Tsai - arXiv preprint arXiv:2301.08937, 2023 - arxiv.org
In natural language processing (NLP), code-mixing (CM) is a challenging task, especially
when the mixed languages include dialects. In Southeast Asian countries such as …

Evaluating the Diversity, Equity and Inclusion of NLP Technology: A Case Study for Indian Languages

S Khanuja, S Ruder, P Talukdar - arXiv preprint arXiv:2205.12676, 2022 - arxiv.org
In order for NLP technology to be widely applicable, fair, and useful, it needs to serve a
diverse set of speakers across the world's languages, be equitable, ie, not unduly biased …

Supervised and unsupervised evaluation of synthetic code-switching

E Orlov, E Artemova - Proceedings of the Eighth Workshop on …, 2022 - aclanthology.org
Code-switching (CS) is a phenomenon of mixing words and phrases from multiple
languages within a single sentence or conversation. The ever-growing amount of CS …

Towards code-mixed Hinglish dialogue generation

V Agarwal, P Rao, DB Jayagopi - … of the 3rd Workshop on Natural …, 2021 - aclanthology.org
Code-mixed language plays a crucial role in communication in multilingual societies.
Though the recent growth of web users has greatly boosted the use of such mixed …

Prabhupadavani: a code-mixed speech translation data for 25 languages

J Sandhan, A Daksh, OA Paranjay, L Behera… - arXiv preprint arXiv …, 2022 - arxiv.org
Nowadays, the interest in code-mixing has become ubiquitous in Natural Language
Processing (NLP); however, not much attention has been given to address this phenomenon …