Unified language model pre-training for natural language understanding and generation L Dong, N Yang, W Wang, F Wei, X Liu, Y Wang, J Gao, M Zhou, HW Hon Advances in neural information processing systems 32, 2019 | 1653 | 2019 |
Minilm: Deep self-attention distillation for task-agnostic compression of pre-trained transformers W Wang, F Wei, L Dong, H Bao, N Yang, M Zhou Advances in Neural Information Processing Systems 33, 5776-5788, 2020 | 929 | 2020 |
Gated self-matching networks for reading comprehension and question answering W Wang, N Yang, F Wei, B Chang, M Zhou Proceedings of the 55th Annual Meeting of the Association for Computational …, 2017 | 826 | 2017 |
Unilmv2: Pseudo-masked language models for unified language model pre-training H Bao, L Dong, F Wei, W Wang, N Yang, X Liu, Y Wang, J Gao, S Piao, ... International conference on machine learning, 642-652, 2020 | 390 | 2020 |
Image as a foreign language: Beit pretraining for all vision and vision-language tasks W Wang, H Bao, L Dong, J Bjorck, Z Peng, Q Liu, K Aggarwal, ... arXiv preprint arXiv:2208.10442, 2022 | 336* | 2022 |
Language is not all you need: Aligning perception with language models S Huang, L Dong, W Wang, Y Hao, S Singhal, S Ma, T Lv, L Cui, ... Advances in Neural Information Processing Systems 36, 2024 | 303 | 2024 |
Kosmos-2: Grounding multimodal large language models to the world Z Peng, W Wang, L Dong, Y Hao, S Huang, S Ma, F Wei arXiv preprint arXiv:2306.14824, 2023 | 303 | 2023 |
InfoXLM: An information-theoretic framework for cross-lingual language model pre-training Z Chi, L Dong, F Wei, N Yang, S Singhal, W Wang, X Song, XL Mao, ... arXiv preprint arXiv:2007.07834, 2020 | 303 | 2020 |
Vlmo: Unified vision-language pre-training with mixture-of-modality-experts H Bao, W Wang, L Dong, Q Liu, OK Mohammed, K Aggarwal, S Som, ... Advances in Neural Information Processing Systems 35, 32897-32912, 2022 | 224 | 2022 |
Graph-based dependency parsing with bidirectional LSTM W Wang, B Chang Proceedings of the 54th Annual Meeting of the Association for Computational …, 2016 | 180 | 2016 |
Minilmv2: Multi-head self-attention relation distillation for compressing pretrained transformers W Wang, H Bao, S Huang, L Dong, F Wei arXiv preprint arXiv:2012.15828, 2020 | 164 | 2020 |
Multiway attention networks for modeling sentence pairs. C Tan, F Wei, W Wang, W Lv, M Zhou IJCAI, 4411-4417, 2018 | 146 | 2018 |
Cross-lingual natural language generation via pre-training Z Chi, L Dong, F Wei, W Wang, XL Mao, H Huang Proceedings of the AAAI conference on artificial intelligence 34 (05), 7570-7577, 2020 | 139 | 2020 |
Language models are general-purpose interfaces Y Hao, H Song, L Dong, S Huang, Z Chi, W Wang, S Ma, F Wei arXiv preprint arXiv:2206.06336, 2022 | 88 | 2022 |
Longnet: Scaling transformers to 1,000,000,000 tokens J Ding, S Ma, L Dong, X Zhang, S Huang, W Wang, N Zheng, F Wei arXiv preprint arXiv:2307.02486, 2023 | 78 | 2023 |
Learning to ask unanswerable questions for machine reading comprehension H Zhu, L Dong, F Wei, W Wang, B Qin, T Liu arXiv preprint arXiv:1906.06045, 2019 | 53 | 2019 |
Harvesting and refining question-answer pairs for unsupervised QA Z Li, W Wang, L Dong, F Wei, K Xu arXiv preprint arXiv:2005.02925, 2020 | 50 | 2020 |
Consistency regularization for cross-lingual fine-tuning B Zheng, L Dong, S Huang, W Wang, Z Chi, S Singhal, W Che, T Liu, ... arXiv preprint arXiv:2106.08226, 2021 | 43 | 2021 |
Vl-beit: Generative vision-language pretraining H Bao, W Wang, L Dong, F Wei arXiv preprint arXiv:2206.01127, 2022 | 38 | 2022 |
Adapt-and-distill: Developing small, fast and effective pretrained language models for domains Y Yao, S Huang, W Wang, L Dong, F Wei arXiv preprint arXiv:2106.13474, 2021 | 36 | 2021 |