Multimodal intelligence: Representation learning, information fusion, and applications

C Zhang, Z Yang, X He, L Deng - IEEE Journal of Selected …, 2020 - ieeexplore.ieee.org
Deep learning methods haverevolutionized speech recognition, image recognition, and
natural language processing since 2010. Each of these tasks involves a single modality in …

A review on explainability in multimodal deep neural nets

G Joshi, R Walambe, K Kotecha - IEEE Access, 2021 - ieeexplore.ieee.org
Artificial Intelligence techniques powered by deep neural nets have achieved much success
in several application domains, most significantly and notably in the Computer Vision …

Making sense of vision and touch: Self-supervised learning of multimodal representations for contact-rich tasks

MA Lee, Y Zhu, K Srinivasan, P Shah… - … on robotics and …, 2019 - ieeexplore.ieee.org
Contact-rich manipulation tasks in unstructured environments often require both haptic and
visual feedback. However, it is non-trivial to manually design a robot controller that …

Audio-visual event localization in unconstrained videos

Y Tian, J Shi, B Li, Z Duan, C Xu - Proceedings of the …, 2018 - openaccess.thecvf.com
In this paper, we introduce a novel problem of audio-visual event localization in
unconstrained videos. We define an audio-visual event as an event that is both visible and …

State representation learning for control: An overview

T Lesort, N Díaz-Rodríguez, JF Goudou, D Filliat - Neural Networks, 2018 - Elsevier
Abstract Representation learning algorithms are designed to learn abstract features that
characterize data. State representation learning (SRL) focuses on a particular kind of …

Making sense of vision and touch: Learning multimodal representations for contact-rich tasks

MA Lee, Y Zhu, P Zachares, M Tan… - IEEE Transactions …, 2020 - ieeexplore.ieee.org
Contact-rich manipulation tasks in unstructured environments often require both haptic and
visual feedback. It is nontrivial to manually design a robot controller that combines these …

Automatic driver stress level classification using multimodal deep learning

MN Rastgoo, B Nakisa, F Maire, A Rakotonirainy… - Expert Systems with …, 2019 - Elsevier
Stress has been identified as one of the contributing factors to vehicle crashes which create
a significant cost in terms of loss of life and productivity for governments and societies …

[HTML][HTML] ELGAR—a European laboratory for gravitation and atom-interferometric research

B Canuel, S Abend, P Amaro-Seoane… - … and Quantum Gravity, 2020 - iopscience.iop.org
Gravitational waves (GWs) were observed for the first time in 2015, one century after
Einstein predicted their existence. There is now growing interest to extend the detection …

Dual-modality seq2seq network for audio-visual event localization

YB Lin, YJ Li, YCF Wang - ICASSP 2019-2019 IEEE …, 2019 - ieeexplore.ieee.org
Audio-visual event localization requires one to identify the event which is both visible and
audible in a video (either at a frame or video level). To address this task, we propose a deep …

Beyond just vision: A review on self-supervised representation learning on multimodal and temporal data

S Deldari, H Xue, A Saeed, J He, DV Smith… - arXiv preprint arXiv …, 2022 - arxiv.org
Recently, Self-Supervised Representation Learning (SSRL) has attracted much attention in
the field of computer vision, speech, natural language processing (NLP), and recently, with …