[PDF][PDF] Contrastive Learning for Sign Language Recognition and Translation.

S Gan, Y Yin, Z Jiang, K Xia, L Xie, S Lu - IJCAI, 2023 - ijcai.org
S Gan, Y Yin, Z Jiang, K Xia, L Xie, S Lu
IJCAI, 2023ijcai.org
There are two problems that widely exist in current end-to-end sign language processing
architecture. One is the CTC spike phenomenon which weakens the visual representational
ability in Continuous Sign Language Recognition (CSLR). The other one is the exposure
bias problem which leads to the accumulation of translation errors during inference in Sign
Language Translation (SLT). In this paper, we tackle these issues by introducing contrast
learning, aiming to enhance both visual-level feature representation and semantic-level …
Abstract
There are two problems that widely exist in current end-to-end sign language processing architecture. One is the CTC spike phenomenon which weakens the visual representational ability in Continuous Sign Language Recognition (CSLR). The other one is the exposure bias problem which leads to the accumulation of translation errors during inference in Sign Language Translation (SLT). In this paper, we tackle these issues by introducing contrast learning, aiming to enhance both visual-level feature representation and semantic-level error tolerance. Specifically, to alleviate CTC spike phenomenon and enhance visual-level representation, we design a visual contrastive loss by minimizing visual feature distance between different augmented samples of frames in one sign video, so that the model can further explore features by utilizing numerous unlabeled frames in an unsupervised way. To alleviate exposure bias problem and improve semantic-level error tolerance, we design a semantic contrastive loss by re-inputting the predicted sentence into semantic module and comparing features of ground-truth sequence and predicted sequence, for exposing model to its own mistakes. Besides, we propose two new metrics, ie, Blank Rate and Consecutive Wrong Word Rate to directly reflect our improvement on the two problems. Extensive experimental results on current sign language datasets demonstrate the effectiveness of our approach, which achieves state-of-the-art performance.
ijcai.org
以上显示的是最相近的搜索结果。 查看全部搜索结果