Audio tampering forensics based on representation learning of enf phase sequence

C Zeng, Y Yang, Z Wang, S Kong… - International Journal of …, 2022 - igi-global.com
This paper proposes an audio tampering detection method based on the ENF phase and BI-
LSTM network from the perspective of temporal feature representation learning. First, the …

Stacked auto-encoders based visual features for speech/music classification

A Kumar, SS Solanki, M Chandra - Expert Systems with Applications, 2022 - Elsevier
With the rapid rise of online available content, multimedia signal processing has become an
important area of research. The output of the speech/music classifier (SMC) is further used …

Automatic text-independent speaker verification using convolutional deep belief network

IA Rakhmanenko, AA Shelupanov… - Computer …, 2020 - ui.adsabs.harvard.edu
This paper is devoted to the use of the convolutional deep belief network as a speech
feature extractor for automatic text-independent speaker verification. The paper describes …

Photovoltaic panel defect detection based on ghost convolution with BottleneckCSP and tiny target prediction head incorporating YOLOv5

L Li, Z Wang, T Zhang - arXiv preprint arXiv:2303.00886, 2023 - arxiv.org
Photovoltaic (PV) panel surface-defect detection technology is crucial for the PV industry to
perform smart maintenance. Using computer vision technology to detect PV panel surface …

Learning behavior recognition in smart classroom with multiple students based on YOLOv5

Z Wang, J Yao, C Zeng, W Wu, H Xu, Y Yang - arXiv preprint arXiv …, 2023 - arxiv.org
Deep learning-based computer vision technology has grown stronger in recent years, and
cross-fertilization using computer vision technology has been a popular direction in recent …

DSARSR: Deep Stacked Auto-encoders Enhanced Robust Speaker Recognition

Z Wang, C Zeng, S Duan, H Ouyang, H Xu - arXiv preprint arXiv …, 2023 - arxiv.org
Speaker recognition is a biometric modality that utilizes the speaker's speech segments to
recognize the identity, determining whether the test speaker belongs to one of the enrolled …

Multi-Scale Deformable Transformers for Student Learning Behavior Detection in Smart Classroom

Z Wang, M Wang, C Zeng, L Li - arXiv preprint arXiv:2410.07834, 2024 - arxiv.org
The integration of Artificial Intelligence into the modern educational system is rapidly
evolving, particularly in monitoring student behavior in classrooms, a task traditionally …

DKT-STDRL: Spatial and Temporal Representation Learning Enhanced Deep Knowledge Tracing for Learning Performance Prediction

L Lyu, Z Wang, H Yun, Z Yang, Y Li - arXiv preprint arXiv:2302.11569, 2023 - arxiv.org
Knowledge tracing (KT) serves as a primary part of intelligent education systems. Most
current KTs either rely on expert judgments or only exploit a single network structure, which …

PhysioFormer: Integrating Multimodal Physiological Signals and Symbolic Regression for Explainable Affective State Prediction

Z Wang, W Wu, C Zeng - arXiv preprint arXiv:2410.11376, 2024 - arxiv.org
Most affective computing tasks still rely heavily on traditional methods, with few deep
learning models applied, particularly in multimodal signal processing. Given the importance …

End-to-end Recording Device Identification Based on Deep Representation Learning

C Zeng, D Zhu, Z Wang, M Wu, W Xiong… - arXiv preprint arXiv …, 2022 - arxiv.org
Deep learning techniques have achieved specific results in recording device source
identification. The recording device source features include spatial information and certain …