The closeness of in-context learning and weight shifting for softmax regression

S Li, Z Song, Y Xia, T Yu, T Zhou - arXiv preprint arXiv:2304.13276, 2023 - arxiv.org
Large language models (LLMs) are known for their exceptional performance in natural
language processing, making them highly effective in many human life-related or even job …

Superiority of softmax: Unveiling the performance edge over linear attention

Y Deng, Z Song, T Zhou - arXiv preprint arXiv:2310.11685, 2023 - arxiv.org
Large transformer models have achieved state-of-the-art results in numerous natural
language processing tasks. Among the pivotal components of the transformer architecture …

The fine-grained complexity of gradient computation for training large language models

J Alman, Z Song - arXiv preprint arXiv:2402.04497, 2024 - arxiv.org
Large language models (LLMs) have made fundamental contributions over the last a few
years. To train an LLM, one needs to alternatingly runforward'computations andbackward' …

A theoretical insight into attack and defense of gradient leakage in transformer

C Li, Z Song, W Wang, C Yang - arXiv preprint arXiv:2311.13624, 2023 - arxiv.org
The Deep Leakage from Gradient (DLG) attack has emerged as a prevalent and highly
effective method for extracting sensitive training data by inspecting exchanged gradients …

Faster robust tensor power method for arbitrary order

Y Deng, Z Song, J Yin - arXiv preprint arXiv:2306.00406, 2023 - arxiv.org
Tensor decomposition is a fundamental method used in various areas to deal with high-
dimensional data.\emph {Tensor power method}(TPM) is one of the widely-used techniques …

One Pass Streaming Algorithm for Super Long Token Attention Approximation in Sublinear Space

R Addanki, C Li, Z Song, C Yang - arXiv preprint arXiv:2311.14652, 2023 - arxiv.org
Deploying Large Language Models (LLMs) in streaming applications that involve long
contexts, particularly for extended dialogues and text analysis, is of paramount importance …

Fast heavy inner product identification between weights and inputs in neural network training

L Qin, S Mitra, Z Song, Y Yang… - 2023 IEEE International …, 2023 - ieeexplore.ieee.org
In this paper, we consider a heavy inner product identification problem, which generalizes
the Light Bulb problem (1): Given two sets A⊂{-1,+1\}^d and B⊂{-1,+1\}^d with |A|=|B|=n, if …

Natural Language Processing Combined with Digital Twins to Drive Fusion Physics Design

Z Yuan, P Liu, H Mao, G Liu, W Xiong… - Proceedings of the 2023 …, 2023 - dl.acm.org
Nuclear fusion has long attracted attention as a clean, efficient and sustainable form of
energy. However, complex physical design and engineering implementation are required to …