Gradient Descent Maximizes the Margin of Homogeneous Neural Networks K Lyu, J Li 2020 International Conference on Learning Representations (ICLR 2020), 2020 | 303 | 2020 |
Theoretical analysis of auto rate-tuning by batch normalization S Arora, Z Li, K Lyu 2019 International Conference on Learning Representations (ICLR 2019), 2019 | 125 | 2019 |
Towards Resolving the Implicit Bias of Gradient Descent for Matrix Factorization: Greedy Low-Rank Learning Z Li, Y Luo, K Lyu 2021 International Conference on Learning Representations (ICLR 2021), 2021 | 120 | 2021 |
Learning gradient descent: Better generalization and longer horizons K Lv, S Jiang, J Li 34th International Conference on Machine Learning (ICML 2017) 70, 2247-2255, 2017 | 113 | 2017 |
Gradient Descent on Two-layer Nets: Margin Maximization and Simplicity Bias K Lyu, Z Li, R Wang, S Arora 35th Conference on Neural Information Processing Systems (NeurIPS 2021), 2021 | 70 | 2021 |
Reconciling Modern Deep Learning with Traditional Optimization Analyses: The Intrinsic Learning Rate Z Li, K Lyu, S Arora 34th Conference on Neural Information Processing Systems (NeurIPS 2020), 2020 | 66 | 2020 |
Understanding the generalization benefit of normalization layers: Sharpness reduction K Lyu, Z Li, S Arora 36th Conference on Neural Information Processing Systems (NeurIPS 2022), 2022 | 64 | 2022 |
Fine-grained complexity meets IP = PSPACE L Chen, S Goldwasser, K Lyu, GN Rothblum, A Rubinstein 30th Annual ACM-SIAM Symposium on Discrete Algorithms (SODA 2019), 1-20, 2019 | 38 | 2019 |
DistillSpec: Improving speculative decoding via knowledge distillation Y Zhou, K Lyu, AS Rawat, AK Menon, A Rostamizadeh, S Kumar, JF Kagy, ... 2024 International Conference on Learning Representations (ICLR 2024), 2023 | 29 | 2023 |
Understanding incremental learning of gradient descent: A fine-grained analysis of matrix sensing J Jin, Z Li, K Lyu, SS Du, JD Lee International Conference on Machine Learning (ICML 2023), 15200-15238, 2023 | 26 | 2023 |
On the SDEs and Scaling Rules for Adaptive Gradient Algorithms S Malladi, K Lyu, A Panigrahi, S Arora 36th Conference on Neural Information Processing Systems (NeurIPS 2022), 2022 | 25 | 2022 |
Why (and When) does Local SGD Generalize Better than SGD? X Gu, K Lyu, L Huang, S Arora 2023 International Conference on Learning Representations (ICLR 2023), 2023 | 17 | 2023 |
Dichotomy of Early and Late Phase Implicit Biases Can Provably Induce Grokking K Lyu, J Jin, Z Li, SS Du, JD Lee, W Hu 2024 International Conference on Learning Representations (ICLR 2024), 2023 | 14 | 2023 |
Single-Source Bottleneck Path Algorithm Faster than Sorting for Sparse Graphs R Duan, K Lyu, H Wu, Y Xie 45th International Colloquium on Automata, Languages, and Programming (ICALP …, 2018 | 7 | 2018 |
Rnns are not transformers (yet): The key bottleneck on in-context retrieval K Wen, X Dang, K Lyu arXiv preprint arXiv:2402.18510, 2024 | 5 | 2024 |
Keeping llms aligned after fine-tuning: The crucial role of prompt templates K Lyu, H Zhao, X Gu, D Yu, A Goyal, S Arora arXiv preprint arXiv:2402.18540, 2024 | 5 | 2024 |
The marginal value of momentum for small learning rate SGD R Wang, S Malladi, T Wang, K Lyu, Z Li 2024 International Conference on Learning Representations (ICLR 2024), 2023 | 5 | 2023 |
New Definitions and Evaluations for Saliency Methods: Staying Intrinsic, Complete and Sound A Gupta, N Saunshi, D Yu, K Lyu, S Arora 36th Conference on Neural Information Processing Systems (NeurIPS 2022), 2022 | 5 | 2022 |
Safety Alignment Should Be Made More Than Just a Few Tokens Deep X Qi, A Panda, K Lyu, X Ma, S Roy, A Beirami, P Mittal, P Henderson arXiv preprint arXiv:2406.05946, 2024 | 1 | 2024 |
Efficient Stagewise Pretraining via Progressive Subnetworks A Panigrahi, N Saunshi, K Lyu, S Miryoosefi, S Reddi, S Kale, S Kumar arXiv preprint arXiv:2402.05913, 2024 | 1 | 2024 |