IndicNLG benchmark: Multilingual datasets for diverse NLG tasks in Indic languages

J Li, T Tang, WX Zhao, JY Nie, JR Wen - ACM Computing Surveys, 2024 - dl.acm.org

Text Generation aims to produce plausible and readable text in human language from input
data. The resurgence of deep learning has greatly advanced this field, in particular, with the …

被引用次数：396 相关文章所有 7 个版本

[PDF] aclanthology.org

NusaCrowd: Open source initiative for Indonesian NLP resources

S Cahyawijaya, H Lovenia, AF Aji… - Findings of the …, 2023 - aclanthology.org

We present NusaCrowd, a collaborative initiative to collect and unify existing resources for
Indonesian languages, including opening access to previously non-public resources …

被引用次数：969 相关文章所有 7 个版本

[PDF] arxiv.org

Naamapadam: a large-scale named entity annotated data for Indic languages

A Mhaske, H Kedia, S Doddapaneni… - arXiv preprint arXiv …, 2022 - arxiv.org

We present, Naamapadam, the largest publicly available Named Entity Recognition (NER)
dataset for the 11 major Indian languages from two language families. The dataset contains …

被引用次数：20 相关文章所有 8 个版本

[PDF] aclanthology.org

Romansetu: Efficiently unlocking multilingual capabilities of large language models via romanization

J Jaavid, R Dabre, M Aswanth, J Gala… - Proceedings of the …, 2024 - aclanthology.org

This study addresses the challenge of extending Large Language Models (LLMs) to non-
English languages, specifically those using non-Roman scripts. We propose an approach …

被引用次数：3 相关文章所有 4 个版本

Towards robust automated math problem solving: a survey of statistical and deep learning approaches

A Saraf, P Kamat, S Gite, S Kumar, K Kotecha - Evolutionary Intelligence, 2024 - Springer

Automated mathematical problem-solving represents a unique intersection of natural
language processing (NLP) and mathematical reasoning, posing significant challenges in …

[PDF] arxiv.org

Airavata: Introducing hindi instruction-tuned llm

J Gala, T Jayakumar, JA Husain, MSUR Khan… - arXiv preprint arXiv …, 2024 - arxiv.org

We announce the initial release of" Airavata," an instruction-tuned LLM for Hindi. Airavata
was created by fine-tuning OpenHathi with diverse, instruction-tuning Hindi datasets to make …

被引用次数：8 相关文章所有 2 个版本

[HTML] sciencedirect.com

[HTML][HTML] Transformer based answer-aware bengali question generation

JF Ruma, TT Mayeesha, RM Rahman - International Journal of Cognitive …, 2023 - Elsevier

Question generation (QG), the task of generating questions from text or other forms of data, a
significant and challenging subject, has recently attracted more attention in natural language …

被引用次数：6 相关文章所有 2 个版本

[PDF] arxiv.org

Pmindiasum: Multilingual and cross-lingual headline summarization for languages in india

A Urlana, P Chen, Z Zhao, SB Cohen… - arXiv preprint arXiv …, 2023 - arxiv.org

This paper introduces PMIndiaSum, a multilingual and massively parallel summarization
corpus focused on languages in India. Our corpus provides a training and testing ground for …

被引用次数：9 相关文章所有 10 个版本

[PDF] arxiv.org

medit: Multilingual text editing via instruction tuning

V Raheja, D Alikaniotis, V Kulkarni, B Alhafni… - arXiv preprint arXiv …, 2024 - arxiv.org

We introduce mEdIT, a multi-lingual extension to CoEdIT--the recent state-of-the-art text
editing models for writing assistance. mEdIT models are trained by fine-tuning multi-lingual …

被引用次数：3 相关文章所有 3 个版本

[PDF] arxiv.org

LexSumm and LexT5: Benchmarking and Modeling Legal Summarization Tasks in English

T Santosh, C Weiss, M Grabmair - arXiv preprint arXiv:2410.09527, 2024 - arxiv.org

In the evolving NLP landscape, benchmarks serve as yardsticks for gauging progress.
However, existing Legal NLP benchmarks only focus on predictive tasks, overlooking …

被引用次数：1 相关文章所有 4 个版本

高级搜索

QQ 群