Z Zhang, P Lin, Z Wang, Y Zhang, ZQJ Xu - arXiv preprint arXiv …, 2024 - arxiv.org
Transformers have shown impressive capabilities across various tasks, but their performance on compositional problems remains a topic of debate. In this work, we …
Z Zhou, H Zhou, Y Li, ZQJ Xu - arXiv preprint arXiv:2305.09947, 2023 - arxiv.org
Previous research has shown that fully-connected networks with small initialization and gradient-based training methods exhibit a phenomenon known as condensation during …
Z Bai, J Zhao, Y Zhang - arXiv preprint arXiv:2405.13721, 2024 - arxiv.org
Matrix factorization models have been extensively studied as a valuable test-bed for understanding the implicit biases of overparameterized models. Although both low nuclear …
J Zhao, Z Bai, Y Zhang - arXiv preprint arXiv:2405.13787, 2024 - arxiv.org
Overparameterized models like deep neural networks have the intriguing ability to recover target functions with fewer sampled data points than parameters (see arXiv: 2307.08921). To …
Z Zhang, P Lin, Z Wang, Y Zhang, ZQJ Xu - The Thirty-eighth Annual … - openreview.net
Transformers have shown impressive capabilities across various tasks, but their performance on compositional problems remains a topic of debate. In this work, we …