R Joshi - arXiv preprint arXiv:2211.11418, 2022 - arxiv.org
The monolingual Hindi BERT models currently available on the model hub do not perform better than the multi-lingual models on downstream tasks. We present L3Cube-HindBERT, a …
R Joshi - arXiv preprint arXiv:2202.01159, 2022 - arxiv.org
We present L3Cube-MahaCorpus a Marathi monolingual data set scraped from different internet sources. We expand the existing Marathi monolingual corpus with 24.8 M sentences …
Bangla--ranked as the 6th most widely spoken language across the world (https://www. ethnologue. com/guides/ethnologue200), with 230 million native speakers--is still …
A Velankar, H Patil, R Joshi - IAPR Workshop on Artificial Neural Networks …, 2022 - Springer
Transformers are the most eminent architectures used for a vast range of Natural Language Processing tasks. These models are pre-trained over a large text corpus and are meant to …
Modern deep learning applications require increasingly more compute to train state-of-the- art models. To address this demand, large corporations and institutions use dedicated High …
Numerous methods have been developed to monitor the spread of negativity in modern years by eliminating vulgar, offensive, and fierce comments from social media platforms …
K Ghosh, A Senapati - Proceedings of the 36th Pacific Asia …, 2022 - aclanthology.org
Warning: This paper contains examples of the language that some people may find offensive. Transformer-based Language models have achieved state-of-the-art performance …
Authorship classification is a method of automatically determining the appropriate author of an unknown linguistic text. Although research on authorship classification has significantly …
We explore the impact of leveraging the relatedness of languages that belong to the same family in NLP models using multilingual fine-tuning. We hypothesize and validate that …