SSHR: Leveraging self-supervised hierarchical representations for multilingual automatic speech recognition

H Xue, Q Shao, K Huang, P Chen… - 2024 IEEE International …, 2024 - ieeexplore.ieee.org
H Xue, Q Shao, K Huang, P Chen, J Liu, L Xie
2024 IEEE International Conference on Multimedia and Expo (ICME), 2024ieeexplore.ieee.org
Multilingual automatic speech recognition (ASR) systems have garnered attention for their
potential to extend language coverage globally. While self-supervised learning (SSL)
models, like MMS, have demonstrated their effectiveness in multilingual ASR, it is worth
noting that various layers' representations potentially contain distinct information that has not
been fully leveraged. In this study, we propose a novel method that leverages self-
supervised hierarchical representations (SSHR) to fine-tune the MMS model. We first …
Multilingual automatic speech recognition (ASR) systems have garnered attention for their potential to extend language coverage globally. While self-supervised learning (SSL) models, like MMS, have demonstrated their effectiveness in multilingual ASR, it is worth noting that various layers’ representations potentially contain distinct information that has not been fully leveraged. In this study, we propose a novel method that leverages self-supervised hierarchical representations (SSHR) to fine-tune the MMS model. We first analyze the different layers of MMS and show that the middle layers capture language-related information, and the high layers encode content-related information, which gradually decreases in the final layers. Then, we extract a language-related frame from correlated middle layers and guide specific language extraction through self-attention mechanisms. Additionally, we steer the model toward acquiring more content-related information in the final layers using our proposed Cross-CTC. We evaluate SSHR on two multilingual datasets, Common Voice and ML-SUPERB, and the experimental results demonstrate that our method achieves state-of-the-art performance to the best of our knowledge.
ieeexplore.ieee.org
以上显示的是最相近的搜索结果。 查看全部搜索结果