Dynamics of learning near singularities in layered networks

DESCINet: A hierarchical deep convolutional neural network with skip connection for long time series forecasting

AQB Silva, WN Gonçalves, ET Matsubara - Expert Systems with …, 2023 - Elsevier

Time series forecasting is the process of predicting future values of a time series from
knowledge of its past data. Although there are several models for making short-term …

被引用次数：12 相关文章所有 2 个版本

[HTML] sciencedirect.com

[HTML][HTML] High-dimensional dynamics of generalization error in neural networks

MS Advani, AM Saxe, H Sompolinsky - Neural Networks, 2020 - Elsevier

We perform an analysis of the average generalization dynamics of large neural networks
trained using gradient descent. We study the practically-relevant “high-dimensional” regime …

被引用次数：530 相关文章所有 8 个版本

[PDF] arxiv.org

Skip connections eliminate singularities

AE Orhan, X Pitkow - arXiv preprint arXiv:1701.09175, 2017 - arxiv.org

Skip connections made the training of very deep networks possible and have become an
indispensable component in a variety of neural architectures. A completely satisfactory …

被引用次数：327 相关文章所有 3 个版本

[图书][B] Information geometry and its applications

S Amari - 2016 - books.google.com

This is the first comprehensive book on information geometry, written by the founder of the
field. It begins with an elementary introduction to dualistic geometry and proceeds to a wide …

被引用次数：1553 相关文章所有 6 个版本

[PDF] ieee.org

Active learning of dynamics for data-driven control using Koopman operators

I Abraham, TD Murphey - IEEE Transactions on Robotics, 2019 - ieeexplore.ieee.org

This paper presents an active learning strategy for robotic systems that takes into account
task information, enables fast learning, and allows control to be readily synthesized by …

被引用次数：204 相关文章所有 8 个版本

[PDF] arxiv.org

Micro-batch training with batch-channel normalization and weight standardization

S Qiao, H Wang, C Liu, W Shen, A Yuille - arXiv preprint arXiv:1903.10520, 2019 - arxiv.org

Batch Normalization (BN) has become an out-of-box technique to improve deep network
training. However, its effectiveness is limited for micro-batch training, ie, each GPU typically …

被引用次数：179 相关文章所有 2 个版本

[PDF] neurips.cc

Stochastic collapse: How gradient noise attracts sgd dynamics towards simpler subnetworks

F Chen, D Kunin, A Yamamura… - Advances in Neural …, 2024 - proceedings.neurips.cc

In this work, we reveal a strong implicit bias of stochastic gradient descent (SGD) that drives
overly expressive networks to much simpler subnetworks, thereby dramatically reducing the …

被引用次数：26 相关文章所有 6 个版本

[图书][B] Algebraic geometry and statistical learning theory

S Watanabe - 2009 - books.google.com

Sure to be influential, Watanabe's book lays the foundations for the use of algebraic
geometry in statistical learning theory. Many models/machines are singular: mixture models …

被引用次数：512 相关文章所有 5 个版本

[PDF] arxiv.org

Learning time-scales in two-layers neural networks

R Berthier, A Montanari, K Zhou - Foundations of Computational …, 2024 - Springer

Gradient-based learning in multi-layer neural networks displays a number of striking
features. In particular, the decrease rate of empirical risk is non-monotone even after …

被引用次数：34 相关文章所有 4 个版本

Classification of malignant tumors in breast ultrasound using a pretrained deep residual network model and support vector machine

WC Shia, DR Chen - Computerized Medical Imaging and Graphics, 2021 - Elsevier

In this study, a transfer learning method was utilized to recognize and classify benign and
malignant breast tumors, using two-dimensional breast ultrasound (US) images, to decrease …

被引用次数：79 相关文章所有 3 个版本

高级搜索

QQ 群