Text embeddings reveal (almost) as much as text

JX Morris, V Kuleshov, V Shmatikov… - arXiv preprint arXiv …, 2023 - arxiv.org
How much private information do text embeddings reveal about the original text? We
investigate the problem of embedding\textit {inversion}, reconstructing the full text …

Language model inversion

JX Morris, W Zhao, JT Chiu, V Shmatikov… - arXiv preprint arXiv …, 2023 - arxiv.org
Language models produce a distribution over the next token; can we use this information to
recover the prompt tokens? We consider the problem of language model inversion and …

Archilles' Heel in Semi-open LLMs: Hiding Bottom against Recovery Attacks

H Huang, Y Li, B Jiang, L Liu, R Sun, Z Liu… - arXiv preprint arXiv …, 2024 - arxiv.org
Closed-source large language models deliver strong performance but have limited
downstream customizability. Semi-open models, combining both closed-source and public …

DORY: Deliberative Prompt Recovery for LLM

L Gao, R Peng, Y Zhang, J Zhao - arXiv preprint arXiv:2405.20657, 2024 - arxiv.org
Prompt recovery in large language models (LLMs) is crucial for understanding how LLMs
work and addressing concerns regarding privacy, copyright, etc. The trend towards …

Is Efficient PAC Learning Possible with an Oracle That Responds' Yes' or'No'?

C Daskalakis, N Golowich - arXiv preprint arXiv:2406.11667, 2024 - arxiv.org
The empirical risk minimization (ERM) principle has been highly impactful in machine
learning, leading both to near-optimal theoretical guarantees for ERM-based learning …

Can't Hide Behind the API: Stealing Black-Box Commercial Embedding Models

MS Tamber, J Xian, J Lin - arXiv preprint arXiv:2406.09355, 2024 - arxiv.org
Embedding models that generate representation vectors from natural language text are
widely used, reflect substantial investments, and carry significant commercial value …

ESpeW: Robust Copyright Protection for LLM-based EaaS via Embedding-Specific Watermark

Z Wang, B Wu, J Deng, Y Yang - arXiv preprint arXiv:2410.17552, 2024 - arxiv.org
Embeddings as a Service (EaaS) is emerging as a crucial role in AI applications.
Unfortunately, EaaS is vulnerable to model extraction attacks, highlighting the urgent need …