Linear stability hypothesis and rank stratification for nonlinear models

Z Zhang, ZQJ Xu - IEEE Transactions on Pattern Analysis and …, 2024 - ieeexplore.ieee.org

It is important to understand how dropout, a popular regularization method, aids in achieving
a good generalization solution during neural network training. In this work, we present a …

被引用次数：21 相关文章所有 11 个版本

[PDF] arxiv.org

Initialization is Critical to Whether Transformers Fit Composite Functions by Inference or Memorizing

Z Zhang, P Lin, Z Wang, Y Zhang, ZQJ Xu - arXiv preprint arXiv …, 2024 - arxiv.org

Transformers have shown impressive capabilities across various tasks, but their
performance on compositional problems remains a topic of debate. In this work, we …

被引用次数：2 相关文章所有 3 个版本

[PDF] arxiv.org

Understanding the initial condensation of convolutional neural networks

Z Zhou, H Zhou, Y Li, ZQJ Xu - arXiv preprint arXiv:2305.09947, 2023 - arxiv.org

Previous research has shown that fully-connected networks with small initialization and
gradient-based training methods exhibit a phenomenon known as condensation during …

被引用次数：5 相关文章所有 4 个版本

[PDF] arxiv.org

高级搜索

QQ 群