Escaping saddle points with adaptive gradient methods

H Wang, Z Qu, Q Zhou, H Zhang, B Luo… - IEEE Internet of …, 2021 - ieeexplore.ieee.org

The ever-growing artificial intelligence (AI) applications have greatly reshaped our world in
many areas, eg, smart home, computer vision, natural language processing, etc. Behind …

被引用次数：45 相关文章所有 2 个版本

[PDF] aaai.org

Faster adaptive federated learning

X Wu, F Huang, Z Hu, H Huang - … of the AAAI conference on artificial …, 2023 - ojs.aaai.org

Federated learning has attracted increasing attention with the emergence of distributed data.
While extensive federated learning algorithms have been proposed for the non-convex …

被引用次数：83 相关文章所有 5 个版本

[PDF] neurips.cc

Why are adaptive methods good for attention models?

J Zhang, SP Karimireddy, A Veit… - Advances in …, 2020 - proceedings.neurips.cc

While stochastic gradient descent (SGD) is still the de facto algorithm in deep learning,
adaptive methods like Clipped SGD/Adam have been observed to outperform SGD across …

被引用次数：270 相关文章所有 11 个版本

[PDF] mlr.press

AdaGrad avoids saddle points

K Antonakopoulos, P Mertikopoulos… - International …, 2022 - proceedings.mlr.press

Adaptive first-order methods in optimization have widespread ML applications due to their
ability to adapt to non-convex landscapes. However, their convergence guarantees are …

被引用次数：21 相关文章所有 13 个版本

[PDF] wiley.com

Deep equilibrium nets

M Azinovic, L Gaegauf… - International Economic …, 2022 - Wiley Online Library

We introduce deep equilibrium nets (DEQNs)—a deep learning‐based method to compute
approximate functional rational expectations equilibria of economic models featuring a …

被引用次数：121 相关文章所有 9 个版本

[PDF] openreview.net

Why adam beats sgd for attention models

J Zhang, SP Karimireddy, A Veit, S Kim, SJ Reddi… - 2019 - openreview.net

While stochastic gradient descent (SGD) is still the de facto algorithm in deep learning,
adaptive methods like Adam have been observed to outperform SGD across important tasks …

被引用次数：87 相关文章

[PDF] mlr.press

Explicit regularization in overparametrized models via noise injection

A Orvieto, A Raj, H Kersting… - … Conference on Artificial …, 2023 - proceedings.mlr.press

Injecting noise within gradient descent has several desirable features, such as smoothing
and regularizing properties. In this paper, we investigate the effects of injecting noise before …

被引用次数：30 相关文章所有 4 个版本

Self-organizing radial basis function neural network using accelerated second-order learning algorithm

HG Han, ML Ma, HY Yang, JF Qiao - Neurocomputing, 2022 - Elsevier

Gradient-based algorithms are commonly used for training radial basis function neural
network (RBFNN). However, it is still difficult to avoid vanishing gradient to improve the …

被引用次数：34 相关文章所有 2 个版本

[HTML] nih.gov

Calibrating the adaptive learning rate to improve convergence of ADAM

Q Tong, G Liang, J Bi - Neurocomputing, 2022 - Elsevier

Adaptive gradient methods (AGMs) have been widely used to optimize nonconvex problems
in the deep learning area. We identify two aspects of AGMs that can be further improved …

被引用次数：81 相关文章所有 8 个版本

[PDF] aaai.org

Decentralized riemannian algorithm for nonconvex minimax problems

X Wu, Z Hu, H Huang - Proceedings of the AAAI Conference on Artificial …, 2023 - ojs.aaai.org

The minimax optimization over Riemannian manifolds (possibly nonconvex constraints) has
been actively applied to solve many problems, such as robust dimensionality reduction and …

被引用次数：16 相关文章所有 6 个版本

高级搜索

QQ 群