Explainable ai: A review of machine learning interpretability methods

P Linardatos, V Papastefanopoulos, S Kotsiantis - Entropy, 2020 - mdpi.com
Recent advances in artificial intelligence (AI) have led to its widespread industrial adoption,
with machine learning systems demonstrating superhuman performance in a significant …

Post-hoc interpretability for neural nlp: A survey

A Madsen, S Reddy, S Chandar - ACM Computing Surveys, 2022 - dl.acm.org
Neural networks for NLP are becoming increasingly complex and widespread, and there is a
growing concern if these models are responsible to use. Explaining models helps to address …

Prompting gpt-3 to be reliable

C Si, Z Gan, Z Yang, S Wang, J Wang… - arXiv preprint arXiv …, 2022 - arxiv.org
Large language models (LLMs) show impressive abilities via few-shot prompting.
Commercialized APIs such as OpenAI GPT-3 further increase their use in real-world …

Language (technology) is power: A critical survey of" bias" in nlp

SL Blodgett, S Barocas, H Daumé III… - arXiv preprint arXiv …, 2020 - arxiv.org
We survey 146 papers analyzing" bias" in NLP systems, finding that their motivations are
often vague, inconsistent, and lacking in normative reasoning, despite the fact that …

Textattack: A framework for adversarial attacks, data augmentation, and adversarial training in nlp

JX Morris, E Lifland, JY Yoo, J Grigsby, D Jin… - arXiv preprint arXiv …, 2020 - arxiv.org
While there has been substantial research using adversarial attacks to analyze NLP models,
each attack is implemented in its own code repository. It remains challenging to develop …

Measure and improve robustness in NLP models: A survey

X Wang, H Wang, D Yang - arXiv preprint arXiv:2112.08313, 2021 - arxiv.org
As NLP models achieved state-of-the-art performances over benchmarks and gained wide
applications, it has been increasingly important to ensure the safe deployment of these …

An empirical survey of data augmentation for limited data learning in nlp

J Chen, D Tam, C Raffel, M Bansal… - Transactions of the …, 2023 - direct.mit.edu
NLP has achieved great progress in the past decade through the use of neural models and
large labeled datasets. The dependence on abundant data prevents NLP models from being …

Cline: Contrastive learning with semantic negative examples for natural language understanding

D Wang, N Ding, P Li, HT Zheng - arXiv preprint arXiv:2107.00440, 2021 - arxiv.org
Despite pre-trained language models have proven useful for learning high-quality semantic
representations, these models are still vulnerable to simple perturbations. Recent works …

A survey of race, racism, and anti-racism in NLP

A Field, SL Blodgett, Z Waseem, Y Tsvetkov - arXiv preprint arXiv …, 2021 - arxiv.org
Despite inextricable ties between race and language, little work has considered race in NLP
research and development. In this work, we survey 79 papers from the ACL anthology that …

[PDF][PDF] Tree of attacks: Jailbreaking black-box llms automatically

A Mehrotra, M Zampetakis, P Kassianik… - arXiv preprint arXiv …, 2023 - ciso2ciso.com
Abstract While Large Language Models (LLMs) display versatile functionality, they continue
to generate harmful, biased, and toxic content, as demonstrated by the prevalence of …