Implicit regularization of dropout

Z Zhang, ZQJ Xu - IEEE Transactions on Pattern Analysis and …, 2024 - ieeexplore.ieee.org
It is important to understand how dropout, a popular regularization method, aids in achieving
a good generalization solution during neural network training. In this work, we present a …

Initialization is Critical to Whether Transformers Fit Composite Functions by Inference or Memorizing

Z Zhang, P Lin, Z Wang, Y Zhang, ZQJ Xu - arXiv preprint arXiv …, 2024 - arxiv.org
Transformers have shown impressive capabilities across various tasks, but their
performance on compositional problems remains a topic of debate. In this work, we …

Understanding the initial condensation of convolutional neural networks

Z Zhou, H Zhou, Y Li, ZQJ Xu - arXiv preprint arXiv:2305.09947, 2023 - arxiv.org
Previous research has shown that fully-connected networks with small initialization and
gradient-based training methods exhibit a phenomenon known as condensation during …

Connectivity Shapes Implicit Regularization in Matrix Factorization Models for Matrix Completion

Z Bai, J Zhao, Y Zhang - arXiv preprint arXiv:2405.13721, 2024 - arxiv.org
Matrix factorization models have been extensively studied as a valuable test-bed for
understanding the implicit biases of overparameterized models. Although both low nuclear …

Disentangle Sample Size and Initialization Effect on Perfect Generalization for Single-Neuron Target

J Zhao, Z Bai, Y Zhang - arXiv preprint arXiv:2405.13787, 2024 - arxiv.org
Overparameterized models like deep neural networks have the intriguing ability to recover
target functions with fewer sampled data points than parameters (see arXiv: 2307.08921). To …

Initialization is Critical to Whether Transformers Fit Composite Functions by Reasoning or Memorizing

Z Zhang, P Lin, Z Wang, Y Zhang, ZQJ Xu - The Thirty-eighth Annual … - openreview.net
Transformers have shown impressive capabilities across various tasks, but their
performance on compositional problems remains a topic of debate. In this work, we …