On layer normalization in the transformer architecture R Xiong, Y Yang, D He, K Zheng, S Zheng, C Xing, H Zhang, Y Lan, ... International Conference on Machine Learning, 10524-10533, 2020 | 883 | 2020 |
A nonconvex approach for phase retrieval: Reshaped wirtinger flow and incremental algorithms H Zhang, Y Liang, Y Chi Journal of Machine Learning Research 18 (141), 1-35, 2017 | 307* | 2017 |
Differentially private fine-tuning of language models D Yu, S Naik, A Backurs, S Gopi, HA Inan, G Kamath, J Kulkarni, YT Lee, ... arXiv preprint arXiv:2110.06500, 2021 | 247 | 2021 |
Provable non-convex phase retrieval with outliers: Median truncatedwirtinger flow H Zhang, Y Chi, Y Liang International conference on machine learning, 1022-1031, 2016 | 140* | 2016 |
Block-diagonal hessian-free optimization for recurrent and convolutional neural networks H Zhang, C Xiong US Patent 11,386,327, 2022 | 98* | 2022 |
Do not let privacy overbill utility: Gradient embedding perturbation for private learning D Yu, H Zhang, W Chen, TY Liu arXiv preprint arXiv:2102.12677, 2021 | 97 | 2021 |
Large scale private learning via low-rank reparametrization D Yu, H Zhang, W Chen, J Yin, TY Liu International Conference on Machine Learning, 12208-12218, 2021 | 82 | 2021 |
Sgd converges to global minimum in deep learning via star-convex path Y Zhou, J Yang, H Zhang, Y Liang, V Tarokh arXiv preprint arXiv:1901.00451, 2019 | 73 | 2019 |
Availability attacks create shortcuts D Yu, H Zhang, W Chen, J Yin, TY Liu Proceedings of the 28th ACM SIGKDD Conference on Knowledge Discovery and …, 2022 | 64* | 2022 |
Adaptive inertia: Disentangling the effects of adaptive learning rate and momentum Z Xie, X Wang, H Zhang, I Sato, M Sugiyama International conference on machine learning, 24430-24459, 2022 | 54* | 2022 |
How does data augmentation affect privacy in machine learning? D Yu, H Zhang, W Chen, J Yin, TY Liu Proceedings of the AAAI Conference on Artificial Intelligence 35 (12), 10746 …, 2021 | 51 | 2021 |
Understanding generalization error of SGD in nonconvex optimization Y Zhou, Y Liang, H Zhang Machine Learning, 1-31, 2022 | 45* | 2022 |
Gradient perturbation is underrated for differentially private convex optimization D Yu, H Zhang, W Chen, TY Liu, J Yin arXiv preprint arXiv:1911.11363, 2019 | 42 | 2019 |
Exploring the limits of differentially private deep learning with group-wise clipping J He, X Li, D Yu, H Zhang, J Kulkarni, YT Lee, A Backurs, N Yu, J Bian arXiv preprint arXiv:2212.01539, 2022 | 36 | 2022 |
Convergence of distributed stochastic variance reduced methods without sampling extra data S Cen, H Zhang, Y Chi, W Chen, TY Liu IEEE Transactions on Signal Processing 68, 3976-3989, 2020 | 31 | 2020 |
Non-convex low-rank matrix recovery with arbitrary outliers via median-truncated gradient descent Y Li, Y Chi, H Zhang, Y Liang Information and Inference: A Journal of the IMA 9 (2), 289-325, 2020 | 31 | 2020 |
-SGD: Optimizing ReLU Neural Networks in its Positively Scale-Invariant Space Q Meng, S Zheng, H Zhang, W Chen, ZM Ma, TY Liu arXiv preprint arXiv:1802.03713, 2018 | 30 | 2018 |
Normalized/clipped sgd with perturbation for differentially private non-convex optimization X Yang, H Zhang, W Chen, TY Liu arXiv preprint arXiv:2206.13033, 2022 | 29 | 2022 |
The capacity region of the source-type model for secret key and private key generation H Zhang, L Lai, Y Liang, H Wang IEEE Transactions on Information Theory 60 (10), 6389-6398, 2014 | 29* | 2014 |
Convergence of adagrad for non-convex objectives: Simple proofs and relaxed assumptions B Wang, H Zhang, Z Ma, W Chen The Thirty Sixth Annual Conference on Learning Theory, 161-190, 2023 | 27 | 2023 |