Demarked: A Strategy for Enhanced Abusive Speech Moderation through Counterspeech, Detoxification, and Message Management

SM Yimam, D Dementieva, T Fischer… - arXiv preprint arXiv …, 2024 - arxiv.org
Despite regulations imposed by nations and social media platforms, such as recent EU
regulations targeting digital violence, abusive content persists as a significant challenge …

Perceiving and countering hate: The role of identity in online responses

K Ping, J Hawdon, E Rho - arXiv preprint arXiv:2411.01675, 2024 - arxiv.org
This study investigates how online counterspeech, defined as direct responses to harmful
online content with the intention of dissuading the perpetrator from further engaging in such …

When in Doubt, Cascade: Towards Building Efficient and Capable Guardrails

M Nagireddy, I Padhi, S Ghosh, P Sattigeri - arXiv preprint arXiv …, 2024 - arxiv.org
Large language models (LLMs) have convincing performance in a variety of downstream
tasks. However, these systems are prone to generating undesirable outputs such as harmful …

Adaptable Moral Stances of Large Language Models on Sexist Content: Implications for Society and Gender Discourse

R Guo, I Nejadgholi, H Dawkins, KC Fraser… - arXiv preprint arXiv …, 2024 - arxiv.org
This work provides an explanatory view of how LLMs can apply moral reasoning to both
criticize and defend sexist language. We assessed eight large language models, all of which …

Examining Human-AI Collaboration for Co-Writing Constructive Comments Online

F Shahid, M Dittgen, M Naaman… - arXiv preprint arXiv …, 2024 - arxiv.org
This paper examines how large language models (LLMs) can help people write constructive
comments in online debates on divisive social issues and whether the notions of …