L3cube-hindbert and devbert: Pre-trained bert transformer models for devanagari based hindi and marathi languages

R Joshi - arXiv preprint arXiv:2211.11418, 2022 - arxiv.org
The monolingual Hindi BERT models currently available on the model hub do not perform
better than the multi-lingual models on downstream tasks. We present L3Cube-HindBERT, a …

L3cube-mahacorpus and mahabert: Marathi monolingual corpus, marathi bert language models, and resources

R Joshi - arXiv preprint arXiv:2202.01159, 2022 - arxiv.org
We present L3Cube-MahaCorpus a Marathi monolingual data set scraped from different
internet sources. We expand the existing Marathi monolingual corpus with 24.8 M sentences …

Mono vs multilingual bert for hate speech detection and text classification: A case study in marathi

A Velankar, H Patil, R Joshi - IAPR Workshop on Artificial Neural Networks …, 2022 - Springer
Transformers are the most eminent architectures used for a vast range of Natural Language
Processing tasks. These models are pre-trained over a large text corpus and are meant to …

Hate and offensive speech detection in hindi and marathi

A Velankar, H Patil, A Gore, S Salunke… - arXiv preprint arXiv …, 2021 - arxiv.org
Sentiment analysis is the most basic NLP task to determine the polarity of text data. There
has been a significant amount of work in the area of multilingual text as well. Still hate and …

L3cube-mahahate: A tweet-based marathi hate speech detection dataset and bert models

A Velankar, H Patil, A Gore, S Salunke… - arXiv preprint arXiv …, 2022 - arxiv.org
Social media platforms are used by a large number of people prominently to express their
thoughts and opinions. However, these platforms have contributed to a substantial amount …

An unsupervised annotation of Arabic texts using multi-label topic modeling and genetic algorithm

HA Almuzaini, AM Azmi - Expert Systems with Applications, 2022 - Elsevier
Every day the world produces an enormous amount of textual data. This unstructured text is
of little use unless it is labeled using a combination of categories, keywords, tags. Humans …

L3cube-mahanlp: Marathi natural language processing datasets, models, and library

R Joshi - arXiv preprint arXiv:2205.14728, 2022 - arxiv.org
Despite being the third most popular language in India, the Marathi language lacks useful
NLP resources. Moreover, popular NLP libraries do not have support for the Marathi …

Comparative study of long document classification

V Wagh, S Khandve, I Joshi, A Wani… - TENCON 2021-2021 …, 2021 - ieeexplore.ieee.org
The amount of information stored in the form of documents on the internet has been
increasing rapidly. Thus it has become a necessity to organize and maintain these …

A survey on NLP resources, tools, and techniques for Marathi language processing

P Lahoti, N Mittal, G Singh - ACM Transactions on Asian and Low …, 2022 - dl.acm.org
Natural Language Processing (NLP) has been in practice for the past couple of decades,
and extensive work has been done for the Western languages, particularly the English …

L3cube-mahaner: A marathi named entity recognition dataset and bert models

O Litake, MR Sabane, PS Patil… - Proceedings of the …, 2022 - aclanthology.org
Abstract Named Entity Recognition (NER) is a basic NLP task and finds major applications
in conversational and search systems. It helps us identify key entities in a sentence used for …