Cyber-attack attribution is an important process that allows experts to put in place attacker- oriented countermeasures and legal actions. The analysts mainly perform attribution …
Abstention, the refusal of large language models (LLMs) to provide an answer, is increasingly recognized for its potential to mitigate hallucinations and enhance safety in …
J Yi, R Ye, Q Chen, B Zhu, S Chen, D Lian… - Findings of the …, 2024 - aclanthology.org
Large language models (LLMs) possess immense capabilities but are susceptible to malicious exploitation. To mitigate the risk, safety alignment is employed to align LLMs with …
Large language models (LLMs) have transformed the field of natural language processing, but they remain susceptible to jailbreaking attacks that exploit their capabilities to generate …
Recent studies show that Large Language Models (LLMs) with safety alignment can be jail- broken by fine-tuning on a dataset mixed with harmful data. First time in the literature, we …
The existing safety alignment of Large Language Models (LLMs) is found fragile and could be easily attacked through different strategies, such as through fine-tuning on a few harmful …
N Bhavsar, J Jordan, S Hakimov… - arXiv preprint arXiv …, 2024 - arxiv.org
What makes a good Large Language Model (LLM)? That it performs well on the relevant benchmarks--which hopefully measure, with some validity, the presence of capabilities that …
Large language models (LLMs) are often fine-tuned for use on downstream tasks, though this can degrade capabilities learned during previous training. This phenomenon, often …