- 学术资源搜索

文章

学术资源搜索

获得 4 条结果（用时0.03秒）

The lazy neuron phenomenon: On emergence of activation sparsity in transformers

Z Li, C You, S Bhojanapalli, D Li, AS Rawat… - arXiv preprint arXiv …, 2022 - arxiv.org

This paper studies the curious phenomenon for machine learning models with Transformer
architectures that their activation maps are sparse. By activation map we refer to the …

被引用次数：79 相关文章所有 4 个版本

[PDF] neurips.cc

Bypass exponential time preprocessing: Fast neural network training via weight-data correlation preprocessing

J Alman, Z Song, R Zhang… - Advances in Neural …, 2024 - proceedings.neurips.cc

Over the last decade, deep neural networks have transformed our society, and they are
already widely applied in various machine learning applications. State-of-the-art deep …

被引用次数：33 相关文章所有 5 个版本

[PDF] arxiv.org

Training multi-layer over-parametrized neural network in subquadratic time

Z Song, L Zhang, R Zhang - arXiv preprint arXiv:2112.07628, 2021 - arxiv.org

We consider the problem of training a multi-layer over-parametrized neural network to
minimize the empirical risk induced by a loss function. In the typical setting of over …

被引用次数：71 相关文章所有 6 个版本

[PDF] arxiv.org

Efficient asynchronize stochastic gradient algorithm with structured data

Z Song, M Ye - arXiv preprint arXiv:2305.08001, 2023 - arxiv.org

Deep learning has achieved impressive success in a variety of fields because of its good
generalization. However, it has been a challenging problem to quickly train a neural network …

被引用次数：1 相关文章所有 2 个版本

高级搜索

QQ 群

The lazy neuron phenomenon: On emergence of activation sparsity in transformers

Bypass exponential time preprocessing: Fast neural network training via weight-data correlation preprocessing

Training multi-layer over-parametrized neural network in subquadratic time

Efficient asynchronize stochastic gradient algorithm with structured data

引用