DeepSeek-Coder-V2: Breaking the Barrier of Closed-Source Models in Code Intelligence Q Zhu, D Guo, Z Shao, D Yang, P Wang, R Xu, Y Wu, Y Li, H Gao, S Ma, ... arXiv preprint arXiv:2406.11931, 2024 | 52 | 2024 |
Deepseek-v2: A strong, economical, and efficient mixture-of-experts language model A Liu, B Feng, B Wang, B Wang, B Liu, C Zhao, C Dengr, C Ruan, D Dai, ... arXiv preprint arXiv:2405.04434, 2024 | 40 | 2024 |
Domain adaptation via maximizing surrogate mutual information H Zhao, C Ma, Q Chen, ZH Deng arXiv preprint arXiv:2110.12184, 2021 | 11 | 2021 |
Selecting large language model to fine-tune via rectified scaling law H Lin, B Huang, H Ye, Q Chen, Z Wang, S Li, J Ma, X Wan, J Zou, Y Liang arXiv preprint arXiv:2402.02314, 2024 | 7 | 2024 |
Exploring in-context learning for knowledge grounded dialog generation Q Chen, W Wu, S Li Findings of the Association for Computational Linguistics: EMNLP 2023, 10071 …, 2023 | 6 | 2023 |
DeepSeek-V2: A strong, economical, and efficient mixture-of-experts language model AL DeepSeek-AI, B Feng, B Wang, B Wang, B Liu, C Zhao, C Dengr, ... arXiv preprint arXiv:2405.04434, 2024 | 5 | 2024 |
Retrieval-based Full-length Wikipedia Generation for Emergent Events J Zhang, EJ Yu, Q Chen, C Xiong, D Zhu, H Qian, M Song, X Li, Q Liu, S Li arXiv preprint arXiv:2402.18264, 2024 | 2 | 2024 |