Detecting and understanding harmful memes: A survey

S Sharma, F Alam, MS Akhtar, D Dimitrov… - arXiv preprint arXiv …, 2022 - arxiv.org
The automatic identification of harmful content online is of major concern for social media
platforms, policymakers, and society. Researchers have studied textual, visual, and audio …

Risk taxonomy, mitigation, and assessment benchmarks of large language model systems

T Cui, Y Wang, C Fu, Y Xiao, S Li, X Deng, Y Liu… - arXiv preprint arXiv …, 2024 - arxiv.org
Large language models (LLMs) have strong capabilities in solving diverse natural language
processing tasks. However, the safety and security issues of LLM systems have become the …

Emojis as anchors to detect arabic offensive language and hate speech

H Mubarak, S Hassan, SA Chowdhury - Natural Language …, 2023 - cambridge.org
We introduce a generic, language-independent method to collect a large percentage of
offensive and hate tweets regardless of their topics or genres. We harness the extralinguistic …

Counterfactually augmented data and unintended bias: The case of sexism and hate speech detection

I Sen, M Samory, C Wagner, I Augenstein - arXiv preprint arXiv …, 2022 - arxiv.org
Counterfactually Augmented Data (CAD) aims to improve out-of-domain generalizability, an
indicator of model robustness. The improvement is credited with promoting core features of …

Effectiveness of Artificial Intelligence–Based Cyberbullying Interventions From Youth Perspective

T Milosevic, K Verma, M Carter, S Vigil… - Social Media+ …, 2023 - journals.sagepub.com
Cyberbullying on social media continues to pose a significant problem for children and
young people, and platforms increasingly rely on artificial intelligence (AI) to address it …

How does counterfactually augmented data impact models for social computing constructs?

I Sen, M Samory, F Flöck, C Wagner… - arXiv preprint arXiv …, 2021 - arxiv.org
As NLP models are increasingly deployed in socially situated settings such as online
abusive content detection, it is crucial to ensure that these models are robust. One way of …

Improving generalizability in implicitly abusive language detection with concept activation vectors

I Nejadgholi, KC Fraser, S Kiritchenko - arXiv preprint arXiv:2204.02261, 2022 - arxiv.org
Robustness of machine learning models on ever-changing real-world data is critical,
especially for applications affecting human well-being such as content moderation. New …

Unsupervised domain adaptation for hate speech detection using a data augmentation approach

SM Sarwar, V Murdock - … of the International AAAI Conference on Web …, 2022 - ojs.aaai.org
Online harassment in the form of hate speech has been on the rise in recent years.
Addressing the issue requires a combination of content moderation by people, aided by …

Overview of abusive and threatening language detection in urdu at fire 2021

M Amjad, A Zhila, G Sidorov, A Labunets… - arXiv preprint arXiv …, 2022 - arxiv.org
With the growth of social media platform influence, the effect of their misuse becomes more
and more impactful. The importance of automatic detection of threatening and abusive …

Leveraging machine translation for cross-lingual fine-grained cyberbullying classification amongst pre-adolescents

K Verma, M Popović, A Poulis… - Natural Language …, 2023 - cambridge.org
Cyberbullying is the wilful and repeated infliction of harm on an individual using the Internet
and digital technologies. Similar to face-to-face bullying, cyberbullying can be captured …