Convex optimization for big data: Scalable, randomized, and parallel algorithms for big data analytics

V Cevher, S Becker, M Schmidt - IEEE Signal Processing …, 2014 - ieeexplore.ieee.org
This article reviews recent advances in convex optimization algorithms for big data, which
aim to reduce the computational, storage, and communications bottlenecks. We provide an …

Achieving geometric convergence for distributed optimization over time-varying graphs

A Nedic, A Olshevsky, W Shi - SIAM Journal on Optimization, 2017 - SIAM
This paper considers the problem of distributed optimization over time-varying graphs. For
the case of undirected graphs, we introduce a distributed algorithm, referred to as DIGing …

Coresets for data-efficient training of machine learning models

B Mirzasoleiman, J Bilmes… - … Conference on Machine …, 2020 - proceedings.mlr.press
Incremental gradient (IG) methods, such as stochastic gradient descent and its variants are
commonly used for large scale optimization in machine learning. Despite the sustained effort …

Adaptive SGD with Polyak stepsize and line-search: Robust convergence and variance reduction

X Jiang, SU Stich - Advances in Neural Information …, 2024 - proceedings.neurips.cc
The recently proposed stochastic Polyak stepsize (SPS) and stochastic line-search (SLS) for
SGD have shown remarkable effectiveness when training over-parameterized models …

[图书][B] Deep learning

Y Bengio, I Goodfellow, A Courville - 2017 - academia.edu
Inventors have long dreamed of creating machines that think. Ancient Greek myths tell of
intelligent objects, such as animated statues of human beings and tables that arrive full of …

Minimizing finite sums with the stochastic average gradient

M Schmidt, N Le Roux, F Bach - Mathematical Programming, 2017 - Springer
We analyze the stochastic average gradient (SAG) method for optimizing the sum of a finite
number of smooth convex functions. Like stochastic gradient (SG) methods, the SAG …

Hogwild!: A lock-free approach to parallelizing stochastic gradient descent

B Recht, C Re, S Wright, F Niu - Advances in neural …, 2011 - proceedings.neurips.cc
Abstract Stochastic Gradient Descent (SGD) is a popular algorithm that can achieve state-of-
the-art performance on a variety of machine learning tasks. Several researchers have …

[图书][B] First-order methods in optimization

A Beck - 2017 - SIAM
This book, as the title suggests, is about first-order methods, namely, methods that exploit
information on values and gradients/subgradients (but not Hessians) of the functions …

A stochastic quasi-Newton method for large-scale optimization

RH Byrd, SL Hansen, J Nocedal, Y Singer - SIAM Journal on Optimization, 2016 - SIAM
The question of how to incorporate curvature information into stochastic approximation
methods is challenging. The direct application of classical quasi-Newton updating …

A stochastic gradient method with an exponential convergence _rate for finite training sets

N Roux, M Schmidt, F Bach - Advances in neural …, 2012 - proceedings.neurips.cc
We propose a new stochastic gradient method for optimizing the sum of a finite set of smooth
functions, where the sum is strongly convex. While standard stochastic gradient methods …