A new generation of perspective api: Efficient multilingual character-level transformers

A Lees, VQ Tran, Y Tay, J Sorensen, J Gupta… - Proceedings of the 28th …, 2022 - dl.acm.org
On the world wide web, toxic content detectors are a crucial line of defense against
potentially hateful and offensive messages. As such, building highly effective classifiers that …

Language generation models can cause harm: So what can we do about it? an actionable survey

S Kumar, V Balachandran, L Njoo… - arXiv preprint arXiv …, 2022 - arxiv.org
Recent advances in the capacity of large language models to generate human-like text have
resulted in their increased adoption in user-facing settings. In parallel, these improvements …

ToxiSpanSE: An explainable toxicity detection in code review comments

J Sarker, S Sultana, SR Wilson… - 2023 ACM/IEEE …, 2023 - ieeexplore.ieee.org
Background: The existence of toxic conversations in open-source platforms can degrade
relationships among software developers and may negatively impact software product …

Ai safety in generative ai large language models: A survey

J Chua, Y Li, S Yang, C Wang, L Yao - arXiv preprint arXiv:2407.18369, 2024 - arxiv.org
Large Language Model (LLMs) such as ChatGPT that exhibit generative AI capabilities are
facing accelerated adoption and innovation. The increased presence of Generative AI (GAI) …

Automated identification of toxic code reviews using toxicr

J Sarker, AK Turzo, M Dong, A Bosu - ACM Transactions on Software …, 2023 - dl.acm.org
Toxic conversations during software development interactions may have serious
repercussions on a Free and Open Source Software (FOSS) development project. For …

A Taxonomy of Rater Disagreements: Surveying Challenges & Opportunities from the Perspective of Annotating Online Toxicity

W Zhang, H Guo, ID Kivlichan, V Prabhakaran… - arXiv preprint arXiv …, 2023 - arxiv.org
Toxicity is an increasingly common and severe issue in online spaces. Consequently, a rich
line of machine learning research over the past decade has focused on computationally …

A benchmark study of the contemporary toxicity detectors on software engineering interactions

J Sarker, AK Turzo, A Bosu - 2020 27th Asia-Pacific Software …, 2020 - ieeexplore.ieee.org
Automated filtering of toxic conversations may help an Open-source software (OSS)
community to maintain healthy interactions among the project participants. Although, several …

Robustness of models addressing Information Disorder: A comprehensive review and benchmarking study

G Fenza, V Loia, C Stanzione, M Di Gisi - Neurocomputing, 2024 - Elsevier
Abstract Machine learning and deep learning models are increasingly susceptible to
adversarial attacks, particularly in critical areas like cybersecurity and Information Disorder …

OCR post-correction for detecting adversarial text images

NH Imam, VG Vassilakis, D Kolovos - Journal of Information Security and …, 2022 - Elsevier
The amount of images with embedded text shared on Online Social Networks (OSNs), such
as Twitter or Facebook has been growing in recent years. It is becoming important to …

Toxic comment classification

S Zaheri, J Leath, D Stroud - SMU Data Science Review, 2020 - scholar.smu.edu
This paper presents a novel application of Natural Language Processing techniques to
classify unstructured text into toxic and non-toxic categories. In the current century, social …