A Nearly-Optimal Bound for Fast Regression with Guarantee

Z Song, M Ye, J Yin, L Zhang - International Conference on …, 2023 - proceedings.mlr.press
Given a matrix $ A\in\mathbb {R}^{n\times d} $ and a vector $ b\in\mathbb {R}^ n $, we
consider the regression problem with $\ell_\infty $ guarantees: finding a vector …

Solving attention kernel regression problem via pre-conditioner

Z Song, J Yin, L Zhang - International Conference on …, 2024 - proceedings.mlr.press
Attention mechanism is the key to large language models, and attention matrix serves as an
algorithmic and computational bottleneck for such a scheme. In this paper, we define two …

Gradientcoin: A peer-to-peer decentralized large language models

Y Gao, Z Song, J Yin - arXiv preprint arXiv:2308.10502, 2023 - arxiv.org
Since 2008, after the proposal of a Bitcoin electronic cash system, Bitcoin has fundamentally
changed the economic system over the last decade. Since 2022, large language models …

An improved sample complexity for rank-1 matrix sensing

Y Deng, Z Li, Z Song - arXiv preprint arXiv:2303.06895, 2023 - arxiv.org
Matrix sensing is a problem in signal processing and machine learning that involves
recovering a low-rank matrix from a set of linear measurements. The goal is to reconstruct …

A sublinear adversarial training algorithm

Y Gao, L Qin, Z Song, Y Wang - arXiv preprint arXiv:2208.05395, 2022 - arxiv.org
Adversarial training is a widely used strategy for making neural networks resistant to
adversarial perturbations. For a neural network of width $ m $, $ n $ input training data in $ d …

Randomized and deterministic attention sparsification algorithms for over-parameterized feature dimension

Y Deng, S Mahadevan, Z Song - arXiv preprint arXiv:2304.04397, 2023 - arxiv.org
Large language models (LLMs) have shown their power in different areas. Attention
computation, as an important subroutine of LLMs, has also attracted interests in theory …

Fast submodular function maximization

L Qin, Z Song, Y Wang - arXiv preprint arXiv:2305.08367, 2023 - arxiv.org
Submodular functions have many real-world applications, such as document summarization,
sensor placement, and image segmentation. For all these applications, the key building …

Accelerating frank-wolfe algorithm using low-dimensional and adaptive data structures

Z Song, Z Xu, Y Yang, L Zhang - arXiv preprint arXiv:2207.09002, 2022 - arxiv.org
In this paper, we study the problem of speeding up a type of optimization algorithms called
Frank-Wolfe, a conditional gradient method. We develop and employ two novel inner …

Adaptive and dynamic multi-resolution hashing for pairwise summations

L Qin, A Reddy, Z Song, Z Xu… - 2022 IEEE International …, 2022 - ieeexplore.ieee.org
In this paper, we propose Adam-Hash: an adaptive and dynamic multi-resolution hashing
data-structure for fast pairwise summation estimation. Given a data-set X⊂ ℝ d, a binary …

Fast distance oracles for any symmetric norm

Y Deng, Z Song, O Weinstein… - Advances in Neural …, 2022 - proceedings.neurips.cc
Abstract In the\emph {Distance Oracle} problem, the goal is to preprocess $ n $ vectors $
x_1, x_2,\cdots, x_n $ in a $ d $-dimensional normed space $(\mathbb {X}^ d,\|\cdot\| _l) …