Deep multimodal representation learning from temporal data

C Zhang, Z Yang, X He, L Deng - IEEE Journal of Selected …, 2020 - ieeexplore.ieee.org

Deep learning methods haverevolutionized speech recognition, image recognition, and
natural language processing since 2010. Each of these tasks involves a single modality in …

被引用次数：356 相关文章所有 3 个版本

[PDF] ieee.org

A review on explainability in multimodal deep neural nets

G Joshi, R Walambe, K Kotecha - IEEE Access, 2021 - ieeexplore.ieee.org

Artificial Intelligence techniques powered by deep neural nets have achieved much success
in several application domains, most significantly and notably in the Computer Vision …

被引用次数：147 相关文章所有 5 个版本

[PDF] arxiv.org

Making sense of vision and touch: Self-supervised learning of multimodal representations for contact-rich tasks

MA Lee, Y Zhu, K Srinivasan, P Shah… - … on robotics and …, 2019 - ieeexplore.ieee.org

Contact-rich manipulation tasks in unstructured environments often require both haptic and
visual feedback. However, it is non-trivial to manually design a robot controller that …

被引用次数：378 相关文章所有 10 个版本

[PDF] thecvf.com

Audio-visual event localization in unconstrained videos

Y Tian, J Shi, B Li, Z Duan, C Xu - Proceedings of the …, 2018 - openaccess.thecvf.com

In this paper, we introduce a novel problem of audio-visual event localization in
unconstrained videos. We define an audio-visual event as an event that is both visible and …

被引用次数：462 相关文章所有 11 个版本

[PDF] arxiv.org

State representation learning for control: An overview

T Lesort, N Díaz-Rodríguez, JF Goudou, D Filliat - Neural Networks, 2018 - Elsevier

Abstract Representation learning algorithms are designed to learn abstract features that
characterize data. State representation learning (SRL) focuses on a particular kind of …

被引用次数：367 相关文章所有 11 个版本

[PDF] arxiv.org

Making sense of vision and touch: Learning multimodal representations for contact-rich tasks

MA Lee, Y Zhu, P Zachares, M Tan… - IEEE Transactions …, 2020 - ieeexplore.ieee.org

Contact-rich manipulation tasks in unstructured environments often require both haptic and
visual feedback. It is nontrivial to manually design a robot controller that combines these …

被引用次数：201 相关文章所有 6 个版本

[PDF] qut.edu.au

Automatic driver stress level classification using multimodal deep learning

MN Rastgoo, B Nakisa, F Maire, A Rakotonirainy… - Expert Systems with …, 2019 - Elsevier

Stress has been identified as one of the contributing factors to vehicle crashes which create
a significant cost in terms of loss of life and productivity for governments and societies …

被引用次数：142 相关文章所有 5 个版本

[HTML] iop.org

[HTML][HTML] ELGAR—a European laboratory for gravitation and atom-interferometric research

B Canuel, S Abend, P Amaro-Seoane… - … and Quantum Gravity, 2020 - iopscience.iop.org

Gravitational waves (GWs) were observed for the first time in 2015, one century after
Einstein predicted their existence. There is now growing interest to extend the detection …

被引用次数：138 相关文章所有 37 个版本

[PDF] arxiv.org

Dual-modality seq2seq network for audio-visual event localization

YB Lin, YJ Li, YCF Wang - ICASSP 2019-2019 IEEE …, 2019 - ieeexplore.ieee.org

Audio-visual event localization requires one to identify the event which is both visible and
audible in a video (either at a frame or video level). To address this task, we propose a deep …

被引用次数：126 相关文章所有 3 个版本

[PDF] arxiv.org

Beyond just vision: A review on self-supervised representation learning on multimodal and temporal data

S Deldari, H Xue, A Saeed, J He, DV Smith… - arXiv preprint arXiv …, 2022 - arxiv.org

Recently, Self-Supervised Representation Learning (SSRL) has attracted much attention in
the field of computer vision, speech, natural language processing (NLP), and recently, with …

被引用次数：36 相关文章所有 2 个版本

高级搜索

QQ 群