R Zhang, F Nie, X Li, X Wei - Information Fusion, 2019 - Elsevier
This survey aims at providing a state-of-the-art overview of feature selection and fusion strategies, which select and combine multi-view features effectively to accomplish …
Lipreading is the task of decoding text from the movement of a speaker's mouth. Traditional approaches separated the problem into two stages: designing or learning visual features …
Deep networks have been successfully applied to unsupervised feature learning for single modalities (eg, text, images or audio). In this work, we propose a novel application of deep …
Multimodal data fusion (MMDF) is the process of combining disparate data streams (of different dimensionality, resolution, type, etc.) to generate information in a form that is more …
This survey aims at providing multimedia researchers with a state-of-the-art overview of fusion strategies, which are used for combining multiple modalities in order to accomplish …
This work presents a scalable solution to open-vocabulary visual speech recognition. To achieve this, we constructed the largest existing visual speech recognition dataset …
In this paper, we present methods in deep multimodal learning for fusing speech and visual modalities for Audio-Visual Automatic Speech Recognition (AV-ASR). First, we study an …
ES Salama, RA El-Khoribi, ME Shoman… - Egyptian Informatics …, 2021 - Elsevier
Nowadays, human emotion recognition is a mandatory task for many human machine interaction fields. This paper proposes a novel multi-modal human emotion recognition …
Lipreading is the task of decoding text from the movement of a speaker's mouth. Traditional approaches separated the problem into two stages: designing or learning visual features …