Linformer: Self-attention with linear complexity S Wang, BZ Li, M Khabsa, H Fang, H Ma arXiv preprint arXiv:2006.04768, 2020 | 1502 | 2020 |
Clear: Contrastive learning for sentence representation Z Wu, S Wang, J Gu, M Khabsa, F Sun, H Ma arXiv preprint arXiv:2012.15466, 2020 | 338 | 2020 |
Blockwise Self-Attention for Long Document Understanding J Qiu, H Ma, O Levy, SW Yih, S Wang, J Tang EMNLP 2020, 2019 | 222 | 2019 |
Entailment as few-shot learner S Wang, H Fang, M Khabsa, H Mao, H Ma arXiv preprint arXiv:2104.14690, 2021 | 186 | 2021 |
Coded Sparse Matrix Multiplication S Wang, J Liu, N Shroff International Conference on Machine Learning, 5139-5147, 2018 | 139 | 2018 |
Luna: Linear unified nested attention X Ma, X Kong, S Wang, C Zhou, J May, H Ma, L Zettlemoyer Advances in Neural Information Processing Systems 34, 2441-2453, 2021 | 123 | 2021 |
Language models as fact checkers? N Lee, BZ Li, S Wang, W Yih, H Ma, M Khabsa Proceedings of the Third Workshop on Fact Extraction and VERification (FEVER …, 2020 | 75 | 2020 |
Effective long-context scaling of foundation models W Xiong, J Liu, I Molybog, H Zhang, P Bhargava, R Hou, L Martin, ... arXiv preprint arXiv:2309.16039, 2023 | 64 | 2023 |
Lm-infinite: Simple on-the-fly length generalization for large language models C Han, Q Wang, W Xiong, Y Chen, H Ji, S Wang arXiv preprint arXiv:2308.16137, 2023 | 60 | 2023 |
Computation efficient coded linear transform S Wang, J Liu, N Shroff, P Yang The 22nd International Conference on Artificial Intelligence and Statistics …, 2019 | 60* | 2019 |
Coded caching with heterogenous cache sizes S Wang, W Li, X Tian, H Liu arXiv preprint arXiv:1504.01123, 2015 | 58 | 2015 |
Fundamental limits of approximate gradient coding S Wang, J Liu, N Shroff SIGMETRICS, 1-22, 2019 | 51 | 2019 |
A new evolutionary multi-objective algorithm to virtual machine placement in virtualized data center C Liu, C Shen, S Li, S Wang 2014 IEEE 5th international conference on software engineering and service …, 2014 | 47 | 2014 |
The performance analysis of coded cache in wireless fading channel W Huang, S Wang, L Ding, F Yang, W Zhang arXiv preprint arXiv:1504.01452, 2015 | 41 | 2015 |
Fundamental limits of heterogenous cache S Wang, W Li, X Tian, H Liu arXiv preprint arXiv:1504.01123, 2015 | 41 | 2015 |
A new approach to multi-objective virtual machine placement in virtualized data center S Wang, H Gu, G Wu 2013 IEEE eighth international conference on networking, architecture and …, 2013 | 39 | 2013 |
IDPG: An instance-dependent prompt generation method Z Wu, S Wang, J Gu, R Hou, Y Dong, VG Vydiswaran, H Ma NAACL 2022, 2022 | 38 | 2022 |
Self-attention with linear complexity. arXiv 2020 S Wang, BZ Li, M Khabsa, H Fang, HL Ma arXiv preprint arXiv:2006.04768, 2021 | 26 | 2021 |
To pretrain or not to pretrain: examining the benefits of pretraining on resource rich tasks S Wang, M Khabsa, H Ma Proceedings of the 58th Annual Meeting of the Association for Computational …, 2020 | 26 | 2020 |
A new alternating direction method for linear programming S Wang, N Shroff Advances in Neural Information Processing Systems 30, 2017 | 25 | 2017 |