Alp-kd: Attention-based layer projection for knowledge distillation P Passban, Y Wu, M Rezagholizadeh, Q Liu Proceedings of the AAAI Conference on artificial intelligence 35 (15), 13657 …, 2021 | 117 | 2021 |
Why skip if you can combine: A simple knowledge distillation technique for intermediate layers Y Wu, P Passban, M Rezagholizade, Q Liu arXiv preprint arXiv:2010.03034, 2020 | 30 | 2020 |
Universal-KD: Attention-based output-grounded intermediate layer knowledge distillation Y Wu, M Rezagholizadeh, A Ghaddar, MA Haidar, A Ghodsi Proceedings of the 2021 Conference on Empirical Methods in Natural Language …, 2021 | 23 | 2021 |
Revisiting pre-trained language models and their evaluation for arabic natural language understanding A Ghaddar, Y Wu, S Bagga, A Rashid, K Bibi, M Rezagholizadeh, C Xing, ... arXiv preprint arXiv:2205.10687, 2022 | 17 | 2022 |
JABER: junior arabic bert A Ghaddar, Y Wu, A Rashid, K Bibi, M Rezagholizadeh, C Xing, Y Wang, ... ArXiv Prepr. ArXiv211204329, 2021 | 10* | 2021 |
Lumen & media segmentation of ivus images via ellipse fitting using a wavelet-decomposed subband cnn P Sinha, Y Wu, I Psaromiligkos, Z Zilic 2020 IEEE 30th International Workshop on Machine Learning for Signal …, 2020 | 10 | 2020 |
Aramus: Pushing the limits of data and model scale for arabic natural language processing A Alghamdi, X Duan, W Jiang, Z Wang, Y Wu, Q Xia, Z Wang, Y Zheng, ... arXiv preprint arXiv:2306.06800, 2023 | 6 | 2023 |
Efficient Citer: Tuning Large Language Models for Enhanced Answer Quality and Verification M Tahaei, A Jafari, A Rashid, D Alfonso-Hermelo, K Bibi, Y Wu, A Ghodsi, ... Findings of the Association for Computational Linguistics: NAACL 2024, 4443-4450, 2024 | 2 | 2024 |
Method and system for training a neural network model using knowledge distillation P Passban, WU Yimeng, M Rezagholizadeh US Patent App. 17/469,573, 2022 | 2 | 2022 |