Improved knowledge distillation via teacher assistant

R Yu, S Liu, X Wang - IEEE Transactions on Pattern Analysis …, 2023 - ieeexplore.ieee.org

Recent success of deep learning is largely attributed to the sheer amount of data used for
training deep neural networks. Despite the unprecedented success, the massive data …

被引用次数：135 相关文章所有 9 个版本

[PDF] arxiv.org

Knowledge distillation and student-teacher learning for visual intelligence: A review and new outlooks

L Wang, KJ Yoon - IEEE transactions on pattern analysis and …, 2021 - ieeexplore.ieee.org

Deep neural models, in recent years, have been successful in almost every field, even
solving the most complex problem statements. However, these models are huge in size with …

被引用次数：809 相关文章所有 10 个版本

[PDF] arxiv.org

Metamath: Bootstrap your own mathematical questions for large language models

L Yu, W Jiang, H Shi, J Yu, Z Liu, Y Zhang… - arXiv preprint arXiv …, 2023 - arxiv.org

Large language models (LLMs) have pushed the limits of natural language understanding
and exhibited excellent problem-solving ability. Despite the great success, most existing …

被引用次数：401 相关文章所有 5 个版本

[PDF] thecvf.com

Decoupled knowledge distillation

B Zhao, Q Cui, R Song, Y Qiu… - Proceedings of the IEEE …, 2022 - openaccess.thecvf.com

State-of-the-art distillation methods are mainly based on distilling deep features from
intermediate layers, while the significance of logit distillation is greatly overlooked. To …

被引用次数：755 相关文章所有 5 个版本

[PDF] neurips.cc

Knowledge distillation from a stronger teacher

T Huang, S You, F Wang, C Qian… - Advances in Neural …, 2022 - proceedings.neurips.cc

Unlike existing knowledge distillation methods focus on the baseline settings, where the
teacher models and training strategies are not that strong and competing as state-of-the-art …

被引用次数：244 相关文章所有 6 个版本

[PDF] baai.ac.cn

A survey on vision transformer

K Han, Y Wang, H Chen, X Chen, J Guo… - IEEE transactions on …, 2022 - ieeexplore.ieee.org

Transformer, first applied to the field of natural language processing, is a type of deep neural
network mainly based on the self-attention mechanism. Thanks to its strong representation …

被引用次数：2611 相关文章所有 7 个版本

[PDF] thecvf.com

Decoupled multimodal distilling for emotion recognition

Y Li, Y Wang, Z Cui - … of the IEEE/CVF Conference on …, 2023 - openaccess.thecvf.com

Human multimodal emotion recognition (MER) aims to perceive human emotions via
language, visual and acoustic modalities. Despite the impressive performance of previous …

被引用次数：111 相关文章所有 8 个版本

[PDF] arxiv.org

Knowledge distillation: A survey

J Gou, B Yu, SJ Maybank, D Tao - International Journal of Computer Vision, 2021 - Springer

In recent years, deep neural networks have been successful in both industry and academia,
especially for computer vision tasks. The great success of deep learning is mainly due to its …

被引用次数：3258 相关文章所有 12 个版本

[PDF] neurips.cc

Knowledge diffusion for distillation

T Huang, Y Zhang, M Zheng, S You… - Advances in …, 2023 - proceedings.neurips.cc

The representation gap between teacher and student is an emerging topic in knowledge
distillation (KD). To reduce the gap and improve the performance, current methods often …

被引用次数：52 相关文章所有 5 个版本

[PDF] arxiv.org

A survey on visual transformer

K Han, Y Wang, H Chen, X Chen, J Guo, Z Liu… - arXiv preprint arXiv …, 2020 - arxiv.org

Transformer, first applied to the field of natural language processing, is a type of deep neural
network mainly based on the self-attention mechanism. Thanks to its strong representation …

被引用次数：389 相关文章所有 3 个版本

高级搜索

QQ 群