Pre-trained language models for text generation: A survey

J Li, T Tang, WX Zhao, JY Nie, JR Wen - ACM Computing Surveys, 2024 - dl.acm.org
Text Generation aims to produce plausible and readable text in human language from input
data. The resurgence of deep learning has greatly advanced this field, in particular, with the …

NusaCrowd: Open source initiative for Indonesian NLP resources

S Cahyawijaya, H Lovenia, AF Aji… - Findings of the …, 2023 - aclanthology.org
We present NusaCrowd, a collaborative initiative to collect and unify existing resources for
Indonesian languages, including opening access to previously non-public resources …

Naamapadam: a large-scale named entity annotated data for Indic languages

A Mhaske, H Kedia, S Doddapaneni… - arXiv preprint arXiv …, 2022 - arxiv.org
We present, Naamapadam, the largest publicly available Named Entity Recognition (NER)
dataset for the 11 major Indian languages from two language families. The dataset contains …

Romansetu: Efficiently unlocking multilingual capabilities of large language models via romanization

J Jaavid, R Dabre, M Aswanth, J Gala… - Proceedings of the …, 2024 - aclanthology.org
This study addresses the challenge of extending Large Language Models (LLMs) to non-
English languages, specifically those using non-Roman scripts. We propose an approach …

Towards robust automated math problem solving: a survey of statistical and deep learning approaches

A Saraf, P Kamat, S Gite, S Kumar, K Kotecha - Evolutionary Intelligence, 2024 - Springer
Automated mathematical problem-solving represents a unique intersection of natural
language processing (NLP) and mathematical reasoning, posing significant challenges in …

Airavata: Introducing hindi instruction-tuned llm

J Gala, T Jayakumar, JA Husain, MSUR Khan… - arXiv preprint arXiv …, 2024 - arxiv.org
We announce the initial release of" Airavata," an instruction-tuned LLM for Hindi. Airavata
was created by fine-tuning OpenHathi with diverse, instruction-tuning Hindi datasets to make …

[HTML][HTML] Transformer based answer-aware bengali question generation

JF Ruma, TT Mayeesha, RM Rahman - International Journal of Cognitive …, 2023 - Elsevier
Question generation (QG), the task of generating questions from text or other forms of data, a
significant and challenging subject, has recently attracted more attention in natural language …

Pmindiasum: Multilingual and cross-lingual headline summarization for languages in india

A Urlana, P Chen, Z Zhao, SB Cohen… - arXiv preprint arXiv …, 2023 - arxiv.org
This paper introduces PMIndiaSum, a multilingual and massively parallel summarization
corpus focused on languages in India. Our corpus provides a training and testing ground for …

medit: Multilingual text editing via instruction tuning

V Raheja, D Alikaniotis, V Kulkarni, B Alhafni… - arXiv preprint arXiv …, 2024 - arxiv.org
We introduce mEdIT, a multi-lingual extension to CoEdIT--the recent state-of-the-art text
editing models for writing assistance. mEdIT models are trained by fine-tuning multi-lingual …

LexSumm and LexT5: Benchmarking and Modeling Legal Summarization Tasks in English

T Santosh, C Weiss, M Grabmair - arXiv preprint arXiv:2410.09527, 2024 - arxiv.org
In the evolving NLP landscape, benchmarks serve as yardsticks for gauging progress.
However, existing Legal NLP benchmarks only focus on predictive tasks, overlooking …