Y Deng, Z Li, Z Song - arXiv preprint arXiv:2304.10411, 2023 - arxiv.org
Large language models (LLMs) have made transformed changes for human society. One of the key computation in LLMs is the softmax unit. This operation is important in LLMs …
Large language models (LLMs) have made fundamental changes in human life. The attention scheme is one of the key components over all the LLMs, such as BERT, GPT-1 …
We consider the problem of training a multi-layer over-parametrized neural network to minimize the empirical risk induced by a loss function. In the typical setting of over …
Y Gu, Z Song - arXiv preprint arXiv:2211.06033, 2022 - arxiv.org
Semidefinite programming is a fundamental tool in optimization and theoretical computer science. It has been extensively used as a black-box for solving many problems, such as …
Recent advances by practitioners in the deep learning community have breathed new life into Locality Sensitive Hashing (LSH), using it to reduce memory and time bottlenecks in …
In this work, we improved the analysis of the running time of SparseGPT [Frantar, Alistarh ICML 2023] from $ O (d^{3}) $ to $ O (d^{\omega}+ d^{2+ a+ o (1)}+ d^{1+\omega (1, 1, a)-a}) …
Soft prompt tuning achieves superior performances across a wide range of few-shot tasks. However, the performances of prompt tuning can be highly sensitive to the initialization of …
S Jiang, Z Song, O Weinstein, H Zhang - Proceedings of the 53rd Annual …, 2021 - dl.acm.org
The fastest known LP solver for general (dense) linear programs is due to [Cohen, Lee and Song'19] and runs in O*(n ω+ n 2.5− α/2+ n 2+ 1/6) time. A number of follow-up works [Lee …
Z Song, X Yang, Y Yang… - … Conference on Machine …, 2023 - proceedings.mlr.press
Projection maintenance is one of the core data structure tasks. Efficient data structures for projection maintenance have led to recent breakthroughs in many convex programming …