NusaCrowd: Open source initiative for Indonesian NLP resources

S Cahyawijaya, H Lovenia, AF Aji… - Findings of the …, 2023 - aclanthology.org
We present NusaCrowd, a collaborative initiative to collect and unify existing resources for
Indonesian languages, including opening access to previously non-public resources …

Aya model: An instruction finetuned open-access multilingual language model

A Üstün, V Aryabumi, ZX Yong, WY Ko… - arXiv preprint arXiv …, 2024 - arxiv.org
Recent breakthroughs in large language models (LLMs) have centered around a handful of
data-rich languages. What does it take to broaden access to breakthroughs beyond first …

Multilingual large language model: A survey of resources, taxonomy and frontiers

L Qin, Q Chen, Y Zhou, Z Chen, Y Li, L Liao… - arXiv preprint arXiv …, 2024 - arxiv.org
Multilingual Large Language Models are capable of using powerful Large Language
Models to handle and respond to queries in multiple languages, which achieves remarkable …

Aya dataset: An open-access collection for multilingual instruction tuning

S Singh, F Vargus, D Dsouza, BF Karlsson… - arXiv preprint arXiv …, 2024 - arxiv.org
Datasets are foundational to many breakthroughs in modern artificial intelligence. Many
recent achievements in the space of natural language processing (NLP) can be attributed to …

GlobalBench: A benchmark for global progress in natural language processing

Y Song, C Cui, S Khanuja, P Liu, F Faisal… - arXiv preprint arXiv …, 2023 - arxiv.org
Despite the major advances in NLP, significant disparities in NLP system performance
across languages still exist. Arguably, these are due to uneven resource allocation and sub …

Findings of the 2023 ml-superb challenge: Pre-training and evaluation over more languages and beyond

J Shi, W Chen, D Berrebbi, HH Wang… - 2023 IEEE Automatic …, 2023 - ieeexplore.ieee.org
The 2023 Multilingual Speech Universal Performance Benchmark (ML-SUPERB) Challenge
expands upon the acclaimed SUPERB framework, emphasizing self-supervised models in …

Leveraging the Multilingual Indonesian Ethnic Languages Dataset In Self-Supervised Models for Low-Resource ASR Task

S Sakti, BA Titalim - 2023 IEEE Automatic Speech Recognition …, 2023 - ieeexplore.ieee.org
Indonesia is home to roughly 700 languages, which amounts to about ten percent of the
global total, positioning it as the second-most linguistically diverse country after Papua New …

Cross-lingual cross-age group adaptation for low-resource elderly speech emotion recognition

S Cahyawijaya, H Lovenia, W Chung, R Frieske… - arXiv preprint arXiv …, 2023 - arxiv.org
Speech emotion recognition plays a crucial role in human-computer interactions. However,
most speech emotion recognition research is biased toward English-speaking adults, which …

Extraction and attribution of public figures statements for journalism in Indonesia using deep learning

YSP WP, YJ Kumar, NZ Zulkarnain, B Raza - Knowledge-Based Systems, 2024 - Elsevier
News articles are usually written by journalists based on statements taken from interviews
with public figures. Attribution from such statements provides important information and it …

Text Stemming and Lemmatization of Regional Languages in Indonesia: A Systematic Literature Review

Z Abidin, A Junaidi - Journal of Information Systems …, 2024 - e-journal.unair.ac.id
Background: Stemming is significantly essential in natural language processing (NLP) due
to the ability to minimize word variations to fundamental forms. This procedure facilitates the …