A survey of methods for addressing class imbalance in deep-learning based natural language processing

S Henning, W Beluch, A Fraser, A Friedrich - arXiv preprint arXiv …, 2022 - arxiv.org
Many natural language processing (NLP) tasks are naturally imbalanced, as some target
categories occur much more frequently than others in the real world. In such scenarios …

Fight fire with fire: Fine-tuning hate detectors using large samples of generated hate speech

T Wullach, A Adler, E Minkov - arXiv preprint arXiv:2109.00591, 2021 - arxiv.org
Automatic hate speech detection is hampered by the scarcity of labeled datasetd, leading to
poor generalization. We employ pretrained language models (LMs) to alleviate this data …

A survey of multi-label text classification based on deep learning

X Chen, J Cheng, J Liu, W Xu, S Hua, Z Tang… - … Conference on Adaptive …, 2022 - Springer
Text classification (TC) is an important basic task in the field of Natural Language
Processing (NLP), and multi-label text classification (MLTC) is an important branch of TC …

Question answering versus named entity recognition for extracting unknown datasets

Y Younes, A Scherp - IEEE Access, 2023 - ieeexplore.ieee.org
Dataset mention extraction is a difficult problem due to the unstructured nature of text, the
sparsity of dataset mentions, and the various ways the same dataset can be mentioned …

Sentiment analysis on electricity Twitter posts

P Kaur, M Edalati - arXiv preprint arXiv:2206.05042, 2022 - arxiv.org
In today's world, everyone is expressive in some way, and the focus of this project is on
people's opinions about rising electricity prices in United Kingdom and India using data from …

Generative ai for hate speech detection: Evaluation and findings

S Pendzel, T Wullach, A Adler… - Regulating Hate Speech …, 2024 - taylorfrancis.com
Hate speech refers to the expression of hateful or violent attitudes based on group affiliation
such as race, nationality, religion, or sexual orientation. In light of the increasing prevalence …

Gaining insights into unrecognized user utterances in task-oriented dialog systems

E Rabinovich, M Vetzler, D Boaz, V Kumar… - arXiv preprint arXiv …, 2022 - arxiv.org
The rapidly growing market demand for automatic dialogue agents capable of goal-oriented
behavior has caused many tech-industry leaders to invest considerable efforts into task …

Multi-Stage Balanced Distillation: Addressing Long-Tail Challenges in Sequence-Level Knowledge Distillation

Y Zhou, J Zhu, P Xu, X Liu, X Wang, D Koutra… - arXiv preprint arXiv …, 2024 - arxiv.org
Large language models (LLMs) have significantly advanced various natural language
processing tasks, but deploying them remains computationally expensive. Knowledge …

Text Grafting: Near-Distribution Weak Supervision for Minority Classes in Text Classification

L Peng, Y Gu, C Dong, Z Wang, J Shang - arXiv preprint arXiv:2406.11115, 2024 - arxiv.org
For extremely weak-supervised text classification, pioneer research generates pseudo
labels by mining texts similar to the class names from the raw corpus, which may end up with …

[HTML][HTML] MaQA: A Manual Text-Based Approach for Car-Specific Question Answering

C Park, S Jeong, J Kim - Electronics, 2024 - mdpi.com
In the past few years, intelligent virtual assistant technology has had a significant impact on
our daily lives, enabling us to easily access information through simple voice commands. In …