A comprehensive survey on training acceleration for large machine learning models in IoT

H Wang, Z Qu, Q Zhou, H Zhang, B Luo… - IEEE Internet of …, 2021 - ieeexplore.ieee.org
The ever-growing artificial intelligence (AI) applications have greatly reshaped our world in
many areas, eg, smart home, computer vision, natural language processing, etc. Behind …

Faster adaptive federated learning

X Wu, F Huang, Z Hu, H Huang - … of the AAAI conference on artificial …, 2023 - ojs.aaai.org
Federated learning has attracted increasing attention with the emergence of distributed data.
While extensive federated learning algorithms have been proposed for the non-convex …

Why are adaptive methods good for attention models?

J Zhang, SP Karimireddy, A Veit… - Advances in …, 2020 - proceedings.neurips.cc
While stochastic gradient descent (SGD) is still the de facto algorithm in deep learning,
adaptive methods like Clipped SGD/Adam have been observed to outperform SGD across …

AdaGrad avoids saddle points

K Antonakopoulos, P Mertikopoulos… - International …, 2022 - proceedings.mlr.press
Adaptive first-order methods in optimization have widespread ML applications due to their
ability to adapt to non-convex landscapes. However, their convergence guarantees are …

Deep equilibrium nets

M Azinovic, L Gaegauf… - International Economic …, 2022 - Wiley Online Library
We introduce deep equilibrium nets (DEQNs)—a deep learning‐based method to compute
approximate functional rational expectations equilibria of economic models featuring a …

Why adam beats sgd for attention models

J Zhang, SP Karimireddy, A Veit, S Kim, SJ Reddi… - 2019 - openreview.net
While stochastic gradient descent (SGD) is still the de facto algorithm in deep learning,
adaptive methods like Adam have been observed to outperform SGD across important tasks …

Explicit regularization in overparametrized models via noise injection

A Orvieto, A Raj, H Kersting… - … Conference on Artificial …, 2023 - proceedings.mlr.press
Injecting noise within gradient descent has several desirable features, such as smoothing
and regularizing properties. In this paper, we investigate the effects of injecting noise before …

Self-organizing radial basis function neural network using accelerated second-order learning algorithm

HG Han, ML Ma, HY Yang, JF Qiao - Neurocomputing, 2022 - Elsevier
Gradient-based algorithms are commonly used for training radial basis function neural
network (RBFNN). However, it is still difficult to avoid vanishing gradient to improve the …

Calibrating the adaptive learning rate to improve convergence of ADAM

Q Tong, G Liang, J Bi - Neurocomputing, 2022 - Elsevier
Adaptive gradient methods (AGMs) have been widely used to optimize nonconvex problems
in the deep learning area. We identify two aspects of AGMs that can be further improved …

Decentralized riemannian algorithm for nonconvex minimax problems

X Wu, Z Hu, H Huang - Proceedings of the AAAI Conference on Artificial …, 2023 - ojs.aaai.org
The minimax optimization over Riemannian manifolds (possibly nonconvex constraints) has
been actively applied to solve many problems, such as robust dimensionality reduction and …