Lexical ambiguity, a challenging phenomenon in all natural languages, is particularly prevalent for languages with diacritics that tend to be omitted in writing, such as Arabic …
We investigate two research questions:(1) how do machine translation (MT) and diacritization influence the performance of each other in a multi-task learning setting (2) the …
S Elgamal, O Obeid, T Kabbani, G Inoue… - arXiv preprint arXiv …, 2024 - arxiv.org
The widespread absence of diacritical marks in Arabic text poses a significant challenge for Arabic natural language processing (NLP). This paper explores instances of naturally …
M Abbache, A Abbache, J Xu, F Meziane… - ACM Transactions on …, 2023 - dl.acm.org
Word embedding is used to represent words for text analysis. It plays an essential role in many Natural Language Processing (NLP) studies and has hugely contributed to the …
S Alqahtani, M Diab - 2019 18th IEEE International Conference …, 2019 - ieeexplore.ieee.org
Diacritic restoration is the task of assigning diacritics (accents) for each character in a given segment. The typical input levels that have been previously used in diacritic restoration …
Diacritization plays a pivotal role for meaning disambiguation and improving readability in Arabic texts. Efforts have long focused on marking every eligible character (Full …
Languages that include diacritics in speech but omit diacritics in writing to a certain degree result in written texts that are even more ambiguous than typically expected. Not including …
This paper introduces a lexical resource, ARLEX, for Modern Standard Arabic (MSA) that explicitly lists ambiguity at the lexical and syntactic levels for each token. Arabic orthography …