Implicit regularization of dropout

Z Zhang, ZQJ Xu - IEEE Transactions on Pattern Analysis and …, 2024 - ieeexplore.ieee.org
It is important to understand how dropout, a popular regularization method, aids in achieving
a good generalization solution during neural network training. In this work, we present a …

Initialization is Critical to Whether Transformers Fit Composite Functions by Inference or Memorizing

Z Zhang, P Lin, Z Wang, Y Zhang, ZQJ Xu - arXiv preprint arXiv …, 2024 - arxiv.org
Transformers have shown impressive capabilities across various tasks, but their
performance on compositional problems remains a topic of debate. In this work, we …

Adaptive Methods through the Lens of SDEs: Theoretical Insights on the Role of Noise

EM Compagnoni, T Liu, R Islamov, FN Proske… - arXiv preprint arXiv …, 2024 - arxiv.org
Despite the vast empirical evidence supporting the efficacy of adaptive optimization methods
in deep learning, their theoretical understanding is far from complete. This work introduces …

Initialization is Critical to Whether Transformers Fit Composite Functions by Reasoning or Memorizing

Z Zhang, P Lin, Z Wang, Y Zhang, ZQJ Xu - The Thirty-eighth Annual … - openreview.net
Transformers have shown impressive capabilities across various tasks, but their
performance on compositional problems remains a topic of debate. In this work, we …