Red teaming chatgpt via jailbreaking: Bias, robustness, reliability and toxicity

R Mao, G Chen, X Zhang, F Guerin… - arXiv preprint arXiv …, 2023 - arxiv.org

The emergence of ChatGPT has generated much speculation in the press about its potential
to disrupt social and economic systems. Its astonishing language ability has aroused strong …

被引用次数：52 相关文章所有 4 个版本

[PDF] ieee.org

Exploring ChatGPT Capabilities and Limitations: A Survey

A Koubaa, W Boulila, L Ghouti, A Alzahem… - IEEE Access, 2023 - ieeexplore.ieee.org

ChatGPT, a groundbreaking natural language processing technology released a few
months ago, has attracted significant attention due to its remarkable capabilities. This AI …

被引用次数：20 相关文章所有 2 个版本

[PDF] arxiv.org

Autodan: Generating stealthy jailbreak prompts on aligned large language models

X Liu, N Xu, M Chen, C Xiao - arXiv preprint arXiv:2310.04451, 2023 - arxiv.org

The aligned Large Language Models (LLMs) are powerful language understanding and
decision-making tools that are created through extensive alignment with human feedback …

被引用次数：95 相关文章所有 2 个版本

[PDF] arxiv.org

Low-resource languages jailbreak gpt-4

ZX Yong, C Menghini, SH Bach - arXiv preprint arXiv:2310.02446, 2023 - arxiv.org

AI safety training and red-teaming of large language models (LLMs) are measures to
mitigate the generation of unsafe content. Our work exposes the inherent cross-lingual …

被引用次数：70 相关文章所有 3 个版本

[PDF] arxiv.org

A survey of large language models for healthcare: from data, technology, and applications to accountability and ethics

K He, R Mao, Q Lin, Y Ruan, X Lan, M Feng… - arXiv preprint arXiv …, 2023 - arxiv.org

The utilization of large language models (LLMs) in the Healthcare domain has generated
both excitement and concern due to their ability to effectively respond to freetext queries with …

被引用次数：55 相关文章所有 2 个版本

[PDF] arxiv.org

Consistency analysis of chatgpt

ME Jang, T Lukasiewicz - arXiv preprint arXiv:2303.06273, 2023 - arxiv.org

ChatGPT has gained a huge popularity since its introduction. Its positive aspects have been
reported through many media platforms, and some analyses even showed that ChatGPT …

被引用次数：44 相关文章所有 6 个版本

[PDF] acm.org

Codehelp: Using large language models with guardrails for scalable support in programming classes

M Liffiton, BE Sheese, J Savelka, P Denny - Proceedings of the 23rd Koli …, 2023 - dl.acm.org

Computing educators face significant challenges in providing timely support to students,
especially in large class settings. Large language models (LLMs) have emerged recently …

被引用次数：45 相关文章所有 4 个版本

[PDF] arxiv.org

Detectllm: Leveraging log rank information for zero-shot detection of machine-generated text

J Su, TY Zhuo, D Wang, P Nakov - arXiv preprint arXiv:2306.05540, 2023 - arxiv.org

With the rapid progress of large language models (LLMs) and the huge amount of text they
generated, it becomes more and more impractical to manually distinguish whether a text is …

被引用次数：33 相关文章所有 7 个版本

[PDF] arxiv.org

From Instructions to Intrinsic Human Values--A Survey of Alignment Goals for Big Models

J Yao, X Yi, X Wang, J Wang, X Xie - arXiv preprint arXiv:2308.12014, 2023 - arxiv.org

Big models, exemplified by Large Language Models (LLMs), are models typically pre-
trained on massive data and comprised of enormous parameters, which not only obtain …

被引用次数：23 相关文章所有 2 个版本

[PDF] mit.edu

Automatically Correcting Large Language Models: Surveying the Landscape of Diverse Automated Correction Strategies

L Pan, M Saxon, W Xu, D Nathani, X Wang… - Transactions of the …, 2024 - direct.mit.edu

While large language models (LLMs) have shown remarkable effectiveness in various NLP
tasks, they are still prone to issues such as hallucination, unfaithful reasoning, and toxicity. A …

被引用次数：3 相关文章所有 4 个版本

高级搜索

QQ 群