Model compression of deep neural network architectures for visual pattern recognition: Current status and future directions

S Bhalgaonkar, M Munot - Computers and Electrical Engineering, 2024 - Elsevier
Abstract Visual Pattern Recognition Networks (VPRNs) are widely used in various visual
data based applications such as computer vision and edge AI. VPRNs help to enhance a …

[HTML][HTML] A Survey on Knowledge Distillation: Recent Advancements

A Moslemi, A Briskina, Z Dang, J Li - Machine Learning with Applications, 2024 - Elsevier
Deep learning has achieved notable success across academia, medicine, and industry. Its
ability to identify complex patterns in large-scale data and to manage millions of parameters …

EarSE: Bringing Robust Speech Enhancement to COTS Headphones

D Duan, Y Chen, W Xu, T Li - Proceedings of the ACM on Interactive …, 2024 - dl.acm.org
Speech enhancement is regarded as the key to the quality of digital communication and is
gaining increasing attention in the research field of audio processing. In this paper, we …

Table tennis track detection based on temporal feature multiplexing network

W Li, X Liu, K An, C Qin, Y Cheng - Sensors, 2023 - mdpi.com
Recording the trajectory of table tennis balls in real-time enables the analysis of the
opponent's attacking characteristics and weaknesses. The current analysis of the ball paths …

Adapting a ConvNeXt model to audio classification on AudioSet

T Pellegrini, I Khalfaoui-Hassani, E Labbé… - arXiv preprint arXiv …, 2023 - arxiv.org
In computer vision, convolutional neural networks (CNN) such as ConvNeXt, have been
able to surpass state-of-the-art transformers, partly thanks to depthwise separable …

[PDF][PDF] Distilling the knowledge of transformers and CNNs with CP-mobile

F Schmid, T Morocutti, S Masoudian… - Proceedings of the …, 2023 - dcase.community
Designing lightweight models that require limited computational resources and can operate
on edge devices is a major trajectory in deep learning research. In the context of Acoustic …

MT4MTL-KD: a multi-teacher knowledge distillation framework for triplet recognition

S Gui, Z Wang, J Chen, X Zhou… - IEEE Transactions on …, 2023 - ieeexplore.ieee.org
The recognition of surgical triplets plays a critical role in the practical application of surgical
videos. It involves the sub-tasks of recognizing instruments, verbs, and targets, while …

Multi-Level Transfer Learning using Incremental Granularities for environmental sound classification and detection

JW Chang, HS Ma, ZY Hu - Applied Soft Computing, 2025 - Elsevier
As sound recognition and classification models become more complex, their performance
improves, but this comes with higher computational demands. This work addresses the …

CED: Consistent ensemble distillation for audio tagging

H Dinkel, Y Wang, Z Yan, J Zhang… - ICASSP 2024-2024 …, 2024 - ieeexplore.ieee.org
Augmentation and knowledge distillation (KD) are well-established techniques employed in
audio classification tasks, aimed at enhancing performance and reducing model sizes on …

[PDF][PDF] CP-JKU submission to dcase23: Efficient acoustic scene classification with cp-mobile

F Schmid, T Morocutti, S Masoudian, K Koutini… - 2023 - dcase.community
In this technical report, we describe the CP-JKU team's submission for Task 1 Low-
Complexity Acoustic Scene Classification of the DCASE 23 challenge. We introduce a novel …