Qa dataset explosion: A taxonomy of nlp resources for question answering and reading comprehension

A Rogers, M Gardner, I Augenstein - ACM Computing Surveys, 2023 - dl.acm.org
Alongside huge volumes of research on deep learning models in NLP in the recent years,
there has been much work on benchmark datasets needed to track modeling progress …

Semeval-2020 task 9: Overview of sentiment analysis of code-mixed tweets

P Patwa, G Aguilar, S Kar, S Pandey, S Pykl… - arXiv preprint arXiv …, 2020 - arxiv.org
In this paper, we present the results of the SemEval-2020 Task 9 on Sentiment Analysis of
Code-Mixed Tweets (SentiMix 2020). We also release and describe our Hinglish (Hindi …

Automatic language identification in texts: A survey

T Jauhiainen, M Lui, M Zampieri, T Baldwin… - Journal of Artificial …, 2019 - jair.org
Language identification (" LI") is the problem of determining the natural language that a
document or part thereof is written in. Automatic LI has been extensively researched for over …

One country, 700+ languages: NLP challenges for underrepresented languages and dialects in Indonesia

AF Aji, GI Winata, F Koto, S Cahyawijaya… - arXiv preprint arXiv …, 2022 - arxiv.org
NLP research is impeded by a lack of resources and awareness of the challenges presented
by underrepresented languages and dialects. Focusing on the languages spoken in …

Multilingual and code-switching ASR challenges for low resource Indian languages

A Diwan, R Vaideeswaran, S Shah, A Singh… - arXiv preprint arXiv …, 2021 - arxiv.org
Recently, there is increasing interest in multilingual automatic speech recognition (ASR)
where a speech recognition system caters to multiple low resource languages by taking …

LinCE: A centralized benchmark for linguistic code-switching evaluation

G Aguilar, S Kar, T Solorio - arXiv preprint arXiv:2005.04322, 2020 - arxiv.org
Recent trends in NLP research have raised an interest in linguistic code-switching (CS);
modern approaches have been proposed to solve a wide range of NLP tasks on multiple …

Findings of the Shared Task on Offensive Span Identification from Code-Mixed Tamil-English Comments

M Ravikiran, BR Chakravarthi, AK Madasamy… - arXiv preprint arXiv …, 2022 - arxiv.org
Offensive content moderation is vital in social media platforms to support healthy online
discussions. However, their prevalence in codemixed Dravidian languages is limited to …

A survey of code-switching: Linguistic and social perspectives for language technologies

AS Doğruöz, S Sitaram, BE Bullock… - arXiv preprint arXiv …, 2023 - arxiv.org
The analysis of data in which multiple languages are represented has gained popularity
among computational linguists in recent years. So far, much of this research focuses mainly …

Opportunities and challenges of automatic speech recognition systems for low-resource language speakers

T Reitmaier, E Wallington, D Kalarikalayil Raju… - Proceedings of the …, 2022 - dl.acm.org
Automatic Speech Recognition (ASR) researchers are turning their attention towards
supporting low-resource languages, such as isiXhosa or Marathi, with only limited training …

Transformer-transducers for code-switched speech recognition

S Dalmia, Y Liu, S Ronanki… - ICASSP 2021-2021 IEEE …, 2021 - ieeexplore.ieee.org
We live in a world where 60% of the population can speak two or more languages fluently.
Members of these communities constantly switch between languages when having a …