Are clinical BERT models privacy preserving? The difficulty of extracting patient-condition...

N Lukas, A Salem, R Sim, S Tople… - … IEEE Symposium on …, 2023 - ieeexplore.ieee.org

Language Models (LMs) have been shown to leak information about training data through
sentence-level membership inference and reconstruction attacks. Understanding the risk of …

被引用次数：115 相关文章所有 5 个版本

[PDF] arxiv.org

Are large pre-trained language models leaking your personal information?

J Huang, H Shao, KCC Chang - arXiv preprint arXiv:2205.12628, 2022 - arxiv.org

Are Large Pre-Trained Language Models Leaking Your Personal Information? In this paper,
we analyze whether Pre-Trained Language Models (PLMs) are prone to leaking personal …

被引用次数：102 相关文章所有 6 个版本

[PDF] arxiv.org

Quantifying privacy risks of masked language models using membership inference attacks

F Mireshghallah, K Goyal, A Uniyal… - arXiv preprint arXiv …, 2022 - arxiv.org

The wide adoption and application of Masked language models~(MLMs) on sensitive data
(from legal to medical) necessitates a thorough quantitative investigation into their privacy …

被引用次数：98 相关文章所有 4 个版本

[PDF] aclanthology.org

Downstream task performance of BERT models pre-trained using automatically de-identified clinical data

T Vakili, A Lamproudis, A Henriksson… - Proceedings of the …, 2022 - aclanthology.org

Automatic de-identification is a cost-effective and straightforward way of removing large
amounts of personally identifiable information from large and sensitive corpora. However …

被引用次数：36 相关文章所有 3 个版本

[PDF] arxiv.org

Do not give away my secrets: Uncovering the privacy issue of neural code completion tools

Y Huang, Y Li, W Wu, J Zhang, MR Lyu - arXiv preprint arXiv:2309.07639, 2023 - arxiv.org

Neural Code Completion Tools (NCCTs) have reshaped the field of software development,
which accurately suggest contextually-relevant code snippets benefiting from language …

被引用次数：9 相关文章所有 2 个版本

[PDF] cell.com Full View

Disclosure control of machine learning models from trusted research environments (TRE): New challenges and opportunities

E Mansouri-Benssassi, S Rogers, S Reel, M Malone… - Heliyon, 2023 - cell.com

Introduction Artificial intelligence (AI) applications in healthcare and medicine have
increased in recent years. To enable access to personal data, Trusted Research …

被引用次数：8 相关文章所有 15 个版本

[PDF] arxiv.org

Digger: Detecting copyright content mis-usage in large language model training

H Li, G Deng, Y Liu, K Wang, Y Li, T Zhang… - arXiv preprint arXiv …, 2024 - arxiv.org

Pre-training, which utilizes extensive and varied datasets, is a critical factor in the success of
Large Language Models (LLMs) across numerous applications. However, the detailed …

被引用次数：8 相关文章所有 2 个版本

[PDF] ieee.org

Membership inference attacks with token-level deduplication on korean language models

MG Oh, LH Park, J Kim, J Park, T Kwon - IEEE Access, 2023 - ieeexplore.ieee.org

The confidentiality threat against training data has become a significant security problem in
neural language models. Recent studies have shown that memorized training data can be …

被引用次数：10 相关文章所有 3 个版本

[PDF] springer.com

End-to-end pseudonymization of fine-tuned clinical BERT models: Privacy preservation with maintained data utility

T Vakili, A Henriksson, H Dalianis - BMC Medical Informatics and Decision …, 2024 - Springer

Many state-of-the-art results in natural language processing (NLP) rely on large pre-trained
language models (PLMs). These models consist of large amounts of parameters that are …

被引用次数：2 相关文章

[PDF] aclanthology.org

Using membership inference attacks to evaluate privacy-preserving language modeling fails for pseudonymizing data

T Vakili, H Dalianis - Proceedings of the 24th Nordic Conference …, 2023 - aclanthology.org

Large pre-trained language models dominate the current state-of-the-art for many natural
language processing applications, including the field of clinical NLP. Several studies have …

被引用次数：6 相关文章所有 3 个版本

高级搜索

QQ 群