Hybrid Attention Time-Frequency Analysis Network for Single-Channel Speech Enhancement

Z Zhang, X Liang, R Xu, M Wang - ICASSP 2024-2024 IEEE …, 2024 - ieeexplore.ieee.org
Z Zhang, X Liang, R Xu, M Wang
ICASSP 2024-2024 IEEE International Conference on Acoustics …, 2024ieeexplore.ieee.org
The time-frequency domain remains central to the speech signal analysis. Enhancing the
efficacy of neural network-based speech models demands a detailed multi-scale analysis of
time-frequency features. This study presents the Hybrid Attention Time-Frequency Analysis
Network (HATFANet), an innovative model that uses a dual-branch structure to concurrently
estimate the ideal ratio mask and the enhanced complex spectrum. Each branch
incorporates Hybrid Attention Blocks (HABs) to capture local, global, and inter-window …
The time-frequency domain remains central to the speech signal analysis. Enhancing the efficacy of neural network-based speech models demands a detailed multi-scale analysis of time-frequency features. This study presents the Hybrid Attention Time-Frequency Analysis Network (HATFANet), an innovative model that uses a dual-branch structure to concurrently estimate the ideal ratio mask and the enhanced complex spectrum. Each branch incorporates Hybrid Attention Blocks (HABs) to capture local, global, and inter-window attention for more effective deep feature extraction by employing reshaping techniques and gated multi-layer perceptrons to focus on different attention scales. The addition of residual channel attention and window multi-head self-attention mechanism accentuate channel attention features and intra-window attention. Our experiments verify the pivotal role of these HABs across varied attentional scales. HATFANet achieves state-of-the-art results on the Voice Bank + DEMAND dataset, recording 3.37 PESQ, 95.8% STOI, and 10.15 SSNR.
ieeexplore.ieee.org
以上显示的是最相近的搜索结果。 查看全部搜索结果