Semi-offline reinforcement learning for optimized text generation C Chen, X Wang, Y Jin, VY Dong, L Dong, J Cao, Y Liu, R Yan International Conference on Machine Learning, 5087-5103, 2023 | 13 | 2023 |
Fortify the shortest stave in attention: Enhancing context awareness of large language models for effective tool use Y Chen, A Lv, TE Lin, C Chen, Y Wu, F Huang, Y Li, R Yan Proceedings of the 62nd Annual Meeting of the Association for Computational …, 2024 | 8 | 2024 |
A pre-training strategy for zero-resource response selection in knowledge-grounded conversations C Tao, C Chen, J Feng, JR Wen, R Yan Proceedings of the 59th Annual Meeting of the Association for Computational …, 2021 | 8 | 2021 |
Cyclealign: Iterative distillation from black-box llm to white-box models for better human alignment J Hong, Q Tu, C Chen, X Gao, J Zhang, R Yan arXiv preprint arXiv:2310.16271, 2023 | 7 | 2023 |
Masked thought: Simply masking partial reasoning steps can improve mathematical reasoning learning of language models C Chen, X Wang, TE Lin, A Lv, Y Wu, X Gao, JR Wen, R Yan, Y Li arXiv preprint arXiv:2403.02178, 2024 | 5 | 2024 |
Prototypical Reward Network for Data-Efficient RLHF J Zhang, X Wang, Y Jin, C Chen, X Zhang, K Liu arXiv preprint arXiv:2406.06606, 2024 | 4 | 2024 |
Personalized chit-chat generation for recommendation using external chat corpora C Chen, X Wang, X Yi, F Wu, X Xie, R Yan Proceedings of the 28th ACM SIGKDD conference on knowledge discovery and …, 2022 | 4 | 2022 |
Prototypical Reward Network for Data-Efficient Model Alignment J Zhang, X Wang, Y Jin, C Chen, X Zhang, K Liu Proceedings of the 62nd Annual Meeting of the Association for Computational …, 2024 | | 2024 |