- 学术资源搜索

A survey on data augmentation for text classification

M Bayer, MA Kaufhold, C Reuter - ACM Computing Surveys, 2022 - dl.acm.org

Data augmentation, the artificial creation of training data for machine learning by
transformations, is a widely studied research field across machine learning disciplines …

被引用次数：407 相关文章所有 5 个版本

[PDF] mdpi.com

Explainable ai: A review of machine learning interpretability methods

P Linardatos, V Papastefanopoulos, S Kotsiantis - Entropy, 2020 - mdpi.com

Recent advances in artificial intelligence (AI) have led to its widespread industrial adoption,
with machine learning systems demonstrating superhuman performance in a significant …

被引用次数：2593 相关文章所有 12 个版本

[PDF] arxiv.org

Universal and transferable adversarial attacks on aligned language models

A Zou, Z Wang, N Carlini, M Nasr, JZ Kolter… - arXiv preprint arXiv …, 2023 - arxiv.org

Because" out-of-the-box" large language models are capable of generating a great deal of
objectionable content, recent work has focused on aligning these models in an attempt to …

被引用次数：1022 相关文章所有 8 个版本

[PDF] neurips.cc

Are aligned neural networks adversarially aligned?

N Carlini, M Nasr… - Advances in …, 2024 - proceedings.neurips.cc

Large language models are now tuned to align with the goals of their creators, namely to be"
helpful and harmless." These models should respond helpfully to user questions, but refuse …

被引用次数：262 相关文章所有 6 个版本

[PDF] neurips.cc

Hard prompts made easy: Gradient-based discrete optimization for prompt tuning and discovery

Y Wen, N Jain, J Kirchenbauer… - Advances in …, 2024 - proceedings.neurips.cc

The strength of modern generative models lies in their ability to be controlled through
prompts. Hard prompts comprise interpretable words and tokens, and are typically hand …

被引用次数：226 相关文章所有 5 个版本

[PDF] aclanthology.org

Red teaming language models with language models

E Perez, S Huang, F Song, T Cai, R Ring… - arXiv preprint arXiv …, 2022 - arxiv.org

Language Models (LMs) often cannot be deployed because of their potential to harm users
in hard-to-predict ways. Prior work identifies harmful behaviors before deployment by using …

被引用次数：581 相关文章所有 4 个版本

[PDF] arxiv.org

Foundational challenges in assuring alignment and safety of large language models

U Anwar, A Saparov, J Rando, D Paleka… - arXiv preprint arXiv …, 2024 - arxiv.org

This work identifies 18 foundational challenges in assuring the alignment and safety of large
language models (LLMs). These challenges are organized into three different categories …

被引用次数：108 相关文章所有 3 个版本

[PDF] aaai.org

Visual adversarial examples jailbreak aligned large language models

X Qi, K Huang, A Panda, P Henderson… - Proceedings of the …, 2024 - ojs.aaai.org

Warning: this paper contains data, prompts, and model outputs that are offensive in nature.
Recently, there has been a surge of interest in integrating vision into Large Language …

被引用次数：143 相关文章所有 5 个版本

[PDF] thecvf.com

On the adversarial robustness of multi-modal foundation models

C Schlarmann, M Hein - Proceedings of the IEEE/CVF …, 2023 - openaccess.thecvf.com

Multi-modal foundation models combining vision and language models such as Flamingo or
GPT-4 have recently gained enormous interest. Alignment of foundation models is used to …

被引用次数：85 相关文章所有 5 个版本

[PDF] mlr.press

Automatically auditing large language models via discrete optimization

E Jones, A Dragan, A Raghunathan… - International …, 2023 - proceedings.mlr.press

Auditing large language models for unexpected behaviors is critical to preempt catastrophic
deployments, yet remains challenging. In this work, we cast auditing as an optimization …

被引用次数：147 相关文章所有 7 个版本

高级搜索

QQ 群