Natural language processing for dialects of a language: A survey

A Joshi, R Dabre, D Kanojia, Z Li, H Zhan… - ACM Computing …, 2024 - dl.acm.org
State-of-the-art natural language processing (NLP) models are trained on massive training
corpora, and report a superlative performance on evaluation datasets. This survey delves …

Jais and jais-chat: Arabic-centric foundation and instruction-tuned open generative large language models

N Sengupta, SK Sahu, B Jia, S Katipomu, H Li… - arXiv preprint arXiv …, 2023 - arxiv.org
We introduce Jais and Jais-chat, new state-of-the-art Arabic-centric foundation and
instruction-tuned open generative large language models (LLMs). The models are based on …

Having beer after prayer? measuring cultural bias in large language models

T Naous, MJ Ryan, A Ritter, W Xu - arXiv preprint arXiv:2305.14456, 2023 - arxiv.org
As the reach of large language models (LMs) expands globally, their ability to cater to
diverse cultural contexts becomes crucial. Despite advancements in multilingual …

AraT5: Text-to-text transformers for Arabic language generation

EMB Nagoudi, AR Elmadany… - arXiv preprint arXiv …, 2021 - arxiv.org
Transfer learning with a unified Transformer framework (T5) that converts all language
problems into a text-to-text format was recently proposed as a simple and effective transfer …

AraFinNlp 2024: The first arabic financial nlp shared task

S Malaysha, M El-Haj, S Ezzini, M Khalilia… - arXiv preprint arXiv …, 2024 - arxiv.org
The expanding financial markets of the Arab world require sophisticated Arabic NLP tools.
To address this need within the banking domain, the Arabic Financial NLP (AraFinNLP) …

WojoodNER 2023: The First Arabic Named Entity Recognition Shared Task

M Jarrar, M Abdul-Mageed, M Khalilia… - arXiv preprint arXiv …, 2023 - arxiv.org
We present WojoodNER-2023, the first Arabic Named Entity Recognition (NER) Shared
Task. The primary focus of WojoodNER-2023 is on Arabic NER, offering novel NER datasets …

ACOM: Arabic Comparative Opinion Mining in Social Media Utilizing Word Embedding, Deep Learning Model & LLM-GPT

A Bayazed, H Almagrabi, D Alahmadi… - IEEE Access, 2024 - ieeexplore.ieee.org
Reliance on social networks has become an integral part of modern daily activities. Social
networks are crowded with vast numbers of comments, opinions, and beliefs about different …

Arabart: a pretrained arabic sequence-to-sequence model for abstractive summarization

MK Eddine, N Tomeh, N Habash, JL Roux… - arXiv preprint arXiv …, 2022 - arxiv.org
Like most natural language understanding and generation tasks, state-of-the-art models for
summarization are transformer-based sequence-to-sequence architectures that are …

Dziribert: a pre-trained language model for the algerian dialect

A Abdaoui, M Berrimi, M Oussalah… - arXiv preprint arXiv …, 2021 - arxiv.org
Pre-trained transformers are now the de facto models in Natural Language Processing given
their state-of-the-art results in many tasks and languages. However, most of the current …

A benchmark for evaluating Arabic contextualized word embedding models

A Elnagar, S Yagi, Y Mansour, L Lulu… - Information Processing & …, 2023 - Elsevier
Word embeddings, which represent words as numerical vectors in a high-dimensional
space, are contextualized by generating a unique vector representation for each sense of a …