Multimodal fusion framework: A multiresolution approach for emotion classification and recognition from physiological signals

GK Verma, US Tiwary - NeuroImage, 2014 - Elsevier
The purpose of this paper is twofold:(i) to investigate the emotion representation models and
find out the possibility of a model with minimum number of continuous dimensions and (ii) to …

Feature selection with multi-view data: A survey

R Zhang, F Nie, X Li, X Wei - Information Fusion, 2019 - Elsevier
This survey aims at providing a state-of-the-art overview of feature selection and fusion
strategies, which select and combine multi-view features effectively to accomplish …

Lipnet: End-to-end sentence-level lipreading

YM Assael, B Shillingford, S Whiteson… - arXiv preprint arXiv …, 2016 - arxiv.org
Lipreading is the task of decoding text from the movement of a speaker's mouth. Traditional
approaches separated the problem into two stages: designing or learning visual features …

[PDF][PDF] Multimodal deep learning

J Ngiam, A Khosla, M Kim, J Nam, H Lee… - Proceedings of the 28th …, 2011 - ai.stanford.edu
Deep networks have been successfully applied to unsupervised feature learning for single
modalities (eg, text, images or audio). In this work, we propose a novel application of deep …

A cross-disciplinary comparison of multimodal data fusion approaches and applications: Accelerating learning through trans-disciplinary information sharing

R Bokade, A Navato, R Ouyang, X Jin, CA Chou… - Expert Systems with …, 2021 - Elsevier
Multimodal data fusion (MMDF) is the process of combining disparate data streams (of
different dimensionality, resolution, type, etc.) to generate information in a form that is more …

Multimodal fusion for multimedia analysis: a survey

PK Atrey, MA Hossain, A El Saddik, MS Kankanhalli - Multimedia systems, 2010 - Springer
This survey aims at providing multimedia researchers with a state-of-the-art overview of
fusion strategies, which are used for combining multiple modalities in order to accomplish …

Large-scale visual speech recognition

B Shillingford, Y Assael, MW Hoffman, T Paine… - arXiv preprint arXiv …, 2018 - arxiv.org
This work presents a scalable solution to open-vocabulary visual speech recognition. To
achieve this, we constructed the largest existing visual speech recognition dataset …

Deep multimodal learning for audio-visual speech recognition

Y Mroueh, E Marcheret, V Goel - 2015 IEEE International …, 2015 - ieeexplore.ieee.org
In this paper, we present methods in deep multimodal learning for fusing speech and visual
modalities for Audio-Visual Automatic Speech Recognition (AV-ASR). First, we study an …

[PDF][PDF] Lipnet: Sentence-level lipreading

YM Assael, B Shillingford, S Whiteson… - arXiv preprint arXiv …, 2016 - innovators-guide.ch
Lipreading is the task of decoding text from the movement of a speaker's mouth. Traditional
approaches separated the problem into two stages: designing or learning visual features …

[HTML][HTML] A 3D-convolutional neural network framework with ensemble learning techniques for multi-modal emotion recognition

ES Salama, RA El-Khoribi, ME Shoman… - Egyptian Informatics …, 2021 - Elsevier
Nowadays, human emotion recognition is a mandatory task for many human machine
interaction fields. This paper proposes a novel multi-modal human emotion recognition …