Language models produce a distribution over the next token; can we use this information to recover the prompt tokens? We consider the problem of language model inversion and …
H Huang, Y Li, B Jiang, L Liu, R Sun, Z Liu… - arXiv preprint arXiv …, 2024 - arxiv.org
Closed-source large language models deliver strong performance but have limited downstream customizability. Semi-open models, combining both closed-source and public …
L Gao, R Peng, Y Zhang, J Zhao - arXiv preprint arXiv:2405.20657, 2024 - arxiv.org
Prompt recovery in large language models (LLMs) is crucial for understanding how LLMs work and addressing concerns regarding privacy, copyright, etc. The trend towards …
The empirical risk minimization (ERM) principle has been highly impactful in machine learning, leading both to near-optimal theoretical guarantees for ERM-based learning …
Embedding models that generate representation vectors from natural language text are widely used, reflect substantial investments, and carry significant commercial value …
Z Wang, B Wu, J Deng, Y Yang - arXiv preprint arXiv:2410.17552, 2024 - arxiv.org
Embeddings as a Service (EaaS) is emerging as a crucial role in AI applications. Unfortunately, EaaS is vulnerable to model extraction attacks, highlighting the urgent need …