We provide the first global optimization landscape analysis of Neural Collapse--an intriguing empirical phenomenon that arises in the last-layer classifiers and features of neural …
We develop and analyze MARINA: a new communication efficient method for non-convex distributed learning over heterogeneous datasets. MARINA employs a novel communication …
This paper studies the complexity for finding approximate stationary points of nonconvex- strongly-concave (NC-SC) smooth minimax problems, in both general and averaged smooth …
R Hu, Y Guo, Y Gong - IEEE Transactions on Mobile Computing, 2023 - ieeexplore.ieee.org
Federated learning (FL) that enables edge devices to collaboratively learn a shared model while keeping their training data locally has received great attention recently and can protect …
Y Zheng, Q Hao, J Wang, C Gao, J Chen, D Jin… - ACM Computing …, 2024 - dl.acm.org
Developing smart cities is vital for ensuring sustainable development and improving human well-being. One critical aspect of building smart cities is designing intelligent methods to …
As science and engineering have become increasingly data-driven, the role of optimization has expanded to touch almost every stage of the data analysis pipeline, from signal and …
Y Pan, Y Li - arXiv preprint arXiv:2306.00204, 2023 - arxiv.org
While stochastic gradient descent (SGD) is still the most popular optimization algorithm in deep learning, adaptive algorithms such as Adam have established empirical advantages …
S Chatterjee - arXiv preprint arXiv:2203.16462, 2022 - arxiv.org
This article presents a criterion for convergence of gradient descent to a global minimum, which is then used to show that gradient descent with proper initialization converges to a …
We develop and analyze DASHA: a new family of methods for nonconvex distributed optimization problems. When the local functions at the nodes have a finite-sum or an …