Explainable ai: A review of machine learning interpretability methods

P Linardatos, V Papastefanopoulos, S Kotsiantis - Entropy, 2020 - mdpi.com
Recent advances in artificial intelligence (AI) have led to its widespread industrial adoption,
with machine learning systems demonstrating superhuman performance in a significant …

A survey of adversarial defenses and robustness in nlp

S Goyal, S Doddapaneni, MM Khapra… - ACM Computing …, 2023 - dl.acm.org
In the past few years, it has become increasingly evident that deep neural networks are not
resilient enough to withstand adversarial perturbations in input data, leaving them …

Can large language models be an alternative to human evaluations?

CH Chiang, H Lee - arXiv preprint arXiv:2305.01937, 2023 - arxiv.org
Human evaluation is indispensable and inevitable for assessing the quality of texts
generated by machine learning models or written by humans. However, human evaluation is …

Explainability for large language models: A survey

H Zhao, H Chen, F Yang, N Liu, H Deng, H Cai… - ACM Transactions on …, 2024 - dl.acm.org
Large language models (LLMs) have demonstrated impressive capabilities in natural
language processing. However, their internal mechanisms are still unclear and this lack of …

Auditing large language models: a three-layered approach

J Mökander, J Schuett, HR Kirk, L Floridi - AI and Ethics, 2023 - Springer
Large language models (LLMs) represent a major advance in artificial intelligence (AI)
research. However, the widespread use of LLMs is also coupled with significant ethical and …

On the opportunities and risks of foundation models

R Bommasani, DA Hudson, E Adeli, R Altman… - arXiv preprint arXiv …, 2021 - arxiv.org
AI is undergoing a paradigm shift with the rise of models (eg, BERT, DALL-E, GPT-3) that are
trained on broad data at scale and are adaptable to a wide range of downstream tasks. We …

A survey on text classification: From traditional to deep learning

Q Li, H Peng, J Li, C Xia, R Yang, L Sun… - ACM Transactions on …, 2022 - dl.acm.org
Text classification is the most fundamental and essential task in natural language
processing. The last decade has seen a surge of research in this area due to the …

Sneakyprompt: Jailbreaking text-to-image generative models

Y Yang, B Hui, H Yuan, N Gong… - 2024 IEEE symposium on …, 2024 - ieeexplore.ieee.org
Text-to-image generative models such as Stable Diffusion and DALL• E raise many ethical
concerns due to the generation of harmful images such as Not-Safe-for-Work (NSFW) ones …

An extensive study on pre-trained models for program understanding and generation

Z Zeng, H Tan, H Zhang, J Li, Y Zhang… - Proceedings of the 31st …, 2022 - dl.acm.org
Automatic program understanding and generation techniques could significantly advance
the productivity of programmers and have been widely studied by academia and industry …

Adversarial glue: A multi-task benchmark for robustness evaluation of language models

B Wang, C Xu, S Wang, Z Gan, Y Cheng, J Gao… - arXiv preprint arXiv …, 2021 - arxiv.org
Large-scale pre-trained language models have achieved tremendous success across a
wide range of natural language understanding (NLU) tasks, even surpassing human …