Toucan: Many-to-many translation for 150 african language pairs

AR Elmadany, I Adebara, M Abdul-Mageed - arXiv preprint arXiv …, 2024 - arxiv.org
We address a notable gap in Natural Language Processing (NLP) by introducing a
collection of resources designed to improve Machine Translation (MT) for low-resource …

From N-grams to Pre-trained Multilingual Models For Language Identification

T Sindane, V Marivate - arXiv preprint arXiv:2410.08728, 2024 - arxiv.org
In this paper, we investigate the use of N-gram models and Large Pre-trained Multilingual
models for Language Identification (LID) across 11 South African languages. For N-gram …

PuoBERTa: Training and evaluation of a curated language model for Setswana

V Marivate, M Mots' Oehli, V Wagnerinst… - … African Conference for …, 2023 - Springer
Natural language processing (NLP) has made significant progress for well-resourced
languages such as English but lagged behind for low-resource languages like Setswana …

GlotCC: An Open Broad-Coverage CommonCrawl Corpus and Pipeline for Minority Languages

AH Kargaran, F Yvon, H Schütze - arXiv preprint arXiv:2410.23825, 2024 - arxiv.org
The need for large text corpora has increased with the advent of pretrained language
models and, in particular, the discovery of scaling laws for these models. Most available …

Building Text and Speech Benchmark Datasets and Models for Low‐Resourced East African Languages: Experiences and Lessons

J Nakatumba‐Nabende, C Babirye… - Applied AI …, 2024 - Wiley Online Library
Africa has over 2000 languages; however, those languages are not well represented in the
existing natural language processing ecosystem. African languages lack essential digital …

[PDF][PDF] CAN ARTIFICIAL INTELLIGENCE REPLACE ASSURANCE, GOVERNANCE AND RISK MANAGEMENT PROFESSIONALS?

PR Nene - Risk Governance & Control: Financial Markets & …, 2024 - virtusinterpress.org
The digitalization of most businesses through the integration of artificial intelligence (AI)
presents a great threat to many professionals asking themselves if their skill set will still be …

Leveraging Bilingual Dictionaries for Improved Setswana-English Machine Translation: A Context-Aware Model

TG Moape, OO Olugbara, SO Ojo - … International Conference on …, 2024 - ieeexplore.ieee.org
There are several challenges that hinder the development of Setswana-to-English machine
translation systems. A key obstacle is the absence of machine-readable knowledge …

NGLUEni: Benchmarking and Adapting Pretrained Language Models for Nguni Languages

F Meyer, H Song, A Chakrabarty, J Buys… - Proceedings of the …, 2024 - aclanthology.org
The Nguni languages have over 20 million home language speakers in South Africa. There
has been considerable growth in the datasets for Nguni languages, but so far no analysis of …

Developing Bilingual English-Setswana Datasets for Space Domain

TG Moape, SO Ojo, OO Olugbara - Proceedings of the Fifth …, 2024 - aclanthology.org
In the current digital age, languages lacking digital presence face an imminent risk of
extinction. In addition, the absence of digital resources poses a significant obstacle to the …

Exploring Machine Translation for code-switching between English and Setswana in South African classrooms

K Mokoka - Deep Learning Indaba 2023 - openreview.net
One of the major challenges of the Department of Education in South Africa is the low
numeracy skills amongst South African learners. This study seeks to spotlight the low …