Token-Ensemble text generation: On attacking the automatic AI-generated text detection

F Huang, H Kwak, J An - arXiv preprint arXiv:2402.11167, 2024 - arxiv.org
arXiv preprint arXiv:2402.11167, 2024arxiv.org
The robustness of AI-content detection models against cultivated attacks (eg, paraphrasing
or word switching) remains a significant concern. This study proposes a novel token-
ensemble generation strategy to challenge the robustness of current AI-content detection
approaches. We explore the ensemble attack strategy by completing the prompt with the
next token generated from random candidate LLMs. We find the token-ensemble approach
significantly drops the performance of AI-content detection models (The code and test sets …
The robustness of AI-content detection models against cultivated attacks (e.g., paraphrasing or word switching) remains a significant concern. This study proposes a novel token-ensemble generation strategy to challenge the robustness of current AI-content detection approaches. We explore the ensemble attack strategy by completing the prompt with the next token generated from random candidate LLMs. We find the token-ensemble approach significantly drops the performance of AI-content detection models (The code and test sets will be released). Our findings reveal that token-ensemble generation poses a vital challenge to current detection models and underlines the need for advancing detection technologies to counter sophisticated adversarial strategies.
arxiv.org
以上显示的是最相近的搜索结果。 查看全部搜索结果