Embedding principle of loss landscape of deep neural networks Y Zhang, Z Zhang, T Luo, ZJ Xu Advances in Neural Information Processing Systems 34, 14848-14859, 2021 | 32 | 2021 |
Embedding principle: a hierarchical structure of loss landscape of deep neural networks Y Zhang, Y Li, Z Zhang, T Luo, ZQJ Xu Journal of Machine Learning, 2021 | 27 | 2021 |
Implicit regularization of dropout Z Zhang, ZQJ Xu IEEE Transactions on Pattern Analysis and Machine Intelligence, 2024 | 16* | 2024 |
Linear stability hypothesis and rank stratification for nonlinear models Y Zhang, Z Zhang, L Zhang, Z Bai, T Luo, ZQJ Xu arXiv preprint arXiv:2211.11623, 2022 | 5 | 2022 |
Optimistic estimate uncovers the potential of nonlinear models Y Zhang, Z Zhang, L Zhang, Z Bai, T Luo, ZQJ Xu arXiv preprint arXiv:2307.08921, 2023 | 4 | 2023 |
Anchor function: a type of benchmark functions for studying language models Z Zhang, Z Wang, J Yao, Z Zhou, X Li, ZQJ Xu arXiv preprint arXiv:2401.08309, 2024 | 3 | 2024 |
Stochastic modified equations and dynamics of dropout algorithm Z Zhang, Y Li, T Luo, ZQJ Xu ICLR 2024, 2023 | 2 | 2023 |
Towards understanding how transformer perform multi-step reasoning with matching operation Z Wang, Y Wang, Z Zhang, Z Zhou, H Jin, T Hu, J Sun, Z Li, Y Zhang, ... arXiv preprint arXiv:2405.15302, 2024 | 1 | 2024 |
Initialization is Critical to Whether Transformers Fit Composite Functions by Inference or Memorizing Z Zhang, P Lin, Z Wang, Y Zhang, ZQJ Xu arXiv preprint arXiv:2405.05409, 2024 | 1 | 2024 |
Loss Spike in Training Neural Networks Z Zhang, ZQJ Xu arXiv preprint arXiv:2305.12133, 2023 | 1 | 2023 |
Loss Jump During Loss Switch in Solving PDEs with Neural Networks Z Wang, L Zhang, Z Zhang, ZQJ Xu arXiv preprint arXiv:2405.03095, 2024 | | 2024 |