Adversarial robustness through dynamic ensemble learning

H Waghela, J Sen, S Rakshit - arXiv preprint arXiv:2412.16254, 2024 - arxiv.org
Adversarial attacks pose a significant threat to the reliability of pre-trained language models
(PLMs) such as GPT, BERT, RoBERTa, and T5. This paper presents Adversarial …

Flipattack: Jailbreak llms via flipping

Y Liu, X He, M Xiong, J Fu, S Deng, B Hooi - arXiv preprint arXiv …, 2024 - arxiv.org
This paper proposes a simple yet effective jailbreak attack named FlipAttack against black-
box LLMs. First, from the autoregressive nature, we reveal that LLMs tend to understand the …

LLM The Genius Paradox: A Linguistic and Math Expert's Struggle with Simple Word-based Counting Problems

N Xu, X Ma - arXiv preprint arXiv:2410.14166, 2024 - arxiv.org
Interestingly, LLMs yet struggle with some basic tasks that humans find trivial to handle, eg,
counting the number of character r's in the word" strawberry". There are several popular …

Robustness of Generative Adversarial CLIPs Against Single-Character Adversarial Attacks in Text-to-Image Generation

P Chanakya, P Harsha, KP Singh - IEEE Access, 2024 - ieeexplore.ieee.org
Generative Adversarial Networks (GANs) have emerged as a powerful type of generative
model, particularly effective at creating images from textual descriptions. Similar to diffusion …

Battle of Transformers: Adversarial Attacks on Financial Sentiment Models

A Can Turetken, M Leippold - Swiss Finance Institute Research …, 2024 - papers.ssrn.com
Financial sentiment analysis models, which extract meaning from vast amounts of
unstructured data, play a crucial role in sentiment-driven financial decisions. However, the …

Certified Robustness Under Bounded Levenshtein Distance

EA Rocamora, GG Chrysos, V Cevher - arXiv preprint arXiv:2501.13676, 2025 - arxiv.org
Text classifiers suffer from small perturbations, that if chosen adversarially, can dramatically
change the output of the model. Verification methods can provide robustness certificates …

An Adversarial Attack Approach on Financial LLMs Driven by Embedding-Similarity Optimization

A Can Türetken - 2024 - zora.uzh.ch
Adversarial attacks on financial sentiment analysis models are a critical area of research
within NLP. We introduce a novel white-box attack method that leverages a pre-trained …

[PDF][PDF] Adversarial Robustness

H Waghela, J Sen, S Rakshit, S Dasgupta - researchgate.net
• Enhancing Robustness of Pre-trained Language Models: The primary objective of this work
is to improve the robustness of pre-trained language models like BERT, ALBERT, and …

Certified Robustness in NLP Under Bounded Levenshtein Distance

EA Rocamora, G Chrysos, V Cevher - ICML 2024 Next Generation of AI … - openreview.net
Natural Language Processing (NLP) models suffer from small perturbations, that if chosen
adversarially, can dramatically change the output of the model. Verification methods can …