Arap-tweet: A large multi-dialect twitter corpus for gender, age and language variety identification

W Zaghouani, A Charfi - arXiv preprint arXiv:1808.07674, 2018 - arxiv.org
In this paper, we present Arap-Tweet, which is a large-scale and multi-dialectal corpus of
Tweets from 11 regions and 16 countries in the Arab world representing the major Arabic …

Homograph disambiguation through selective diacritic restoration

S Alqahtani, H Aldarmaki, M Diab - arXiv preprint arXiv:1912.04479, 2019 - arxiv.org
Lexical ambiguity, a challenging phenomenon in all natural languages, is particularly
prevalent for languages with diacritics that tend to be omitted in writing, such as Arabic …

Automatic diacritics restoration for Tunisian dialect

A Masmoudi, S Mdhaffar, R Sellami… - ACM Transactions on …, 2019 - dl.acm.org
Modern Standard Arabic, as well as Arabic dialect languages, are usually written without
diacritics. The absence of these marks constitute a real problem in the automatic processing …

So hateful! Building a multi-label hate speech annotated Arabic dataset

W Zaghouani, H Mubarak… - Proceedings of the 2024 …, 2024 - aclanthology.org
Social media enables widespread propagation of hate speech targeting groups based on
ethnicity, religion, or other characteristics. With manual content moderation being infeasible …

MARASTA: A multi-dialectal Arabic cross-domain stance corpus

A Charfi, M Ben-Sghaier, ASR Atalla… - Proceedings of the …, 2024 - aclanthology.org
This paper introduces a cross-domain and multi-dialectal stance corpus for Arabic that
includes four regions in the Arab World and covers the main Arabic dialect groups. Our …

A layered language model based hybrid approach to automatic full diacritization of Arabic

M Al-Badrashiny, A Hawwari… - Proceedings of the Third …, 2017 - aclanthology.org
In this paper we present a system for automatic Arabic text diacritization using three levels of
analysis granularity in a layered back off manner. We build and exploit diacritized language …

Characteristic of Teaching Materials for Arabic Reading Skill with Inductive Approach

H Istiqomah, MJH Al-Badrani - Izdihar: Journal of Arabic …, 2020 - ejournal.umm.ac.id
Himmati is deliberately created as a textbook modification to recognize the Qur'an prepared
for beginners. This research aimed to describe the characteristics of teaching material …

Automatic diacritization of tunisian dialect text using smt model

A Masmoudi, C Aloulou, AGS Abdellahi… - International Journal of …, 2022 - Springer
Unlike other tongues, Arabic language is characterized by its written form which is
essentially consonant and may not have short vowels. One of the major functions of short …

Towards a high-quality lemma-based text to speech system for the Arabic language

O Zine, A Meziane, M Boudchiche - … 2017, Fez, Morocco, October 11–12 …, 2018 - Springer
Recent numbers put the Arabic language at around 250 million native speakers, making it
the fifth spoken language regarding the number of speakers. Therefore, it has gained the …

[PDF][PDF] Mandiac: A web-based annotation system for manual arabic diacritization

O Obeid, H Bouamor, W Zaghouani… - The 2nd Workshop on …, 2016 - researchgate.net
In this paper, We introduce MANDIAC, a web-based annotation system designed for rapid
manual diacritization of Standard Arabic text. To expedite the annotation process, the system …