On the equivalence of different adaptive batch size selection strategies for stochastic gradient descent methods

L Espath, S Krumscheid, R Tempone… - arXiv preprint arXiv …, 2021 - arxiv.org
arXiv preprint arXiv:2109.10933, 2021arxiv.org
In this study, we demonstrate that the norm test and inner product/orthogonality test
presented in\cite {Bol18} are equivalent in terms of the convergence rates associated with
Stochastic Gradient Descent (SGD) methods if $\epsilon^ 2=\theta^ 2+\nu^ 2$ with specific
choices of $\theta $ and $\nu $. Here, $\epsilon $ controls the relative statistical error of the
norm of the gradient while $\theta $ and $\nu $ control the relative statistical error of the
gradient in the direction of the gradient and in the direction orthogonal to the gradient …
In this study, we demonstrate that the norm test and inner product/orthogonality test presented in \cite{Bol18} are equivalent in terms of the convergence rates associated with Stochastic Gradient Descent (SGD) methods if with specific choices of and . Here, controls the relative statistical error of the norm of the gradient while and control the relative statistical error of the gradient in the direction of the gradient and in the direction orthogonal to the gradient, respectively. Furthermore, we demonstrate that the inner product/orthogonality test can be as inexpensive as the norm test in the best case scenario if and are optimally selected, but the inner product/orthogonality test will never be more computationally affordable than the norm test if . Finally, we present two stochastic optimization problems to illustrate our results.
arxiv.org
以上显示的是最相近的搜索结果。 查看全部搜索结果