An overview of deep-learning-based audio-visual speech enhancement and separation

D Michelsanti, ZH Tan, SX Zhang, Y Xu… - … on Audio, Speech …, 2021 - ieeexplore.ieee.org
Speech enhancement and speech separation are two related tasks, whose purpose is to
extract either one or more target speech signals, respectively, from a mixture of sounds …

A review on explainability in multimodal deep neural nets

G Joshi, R Walambe, K Kotecha - IEEE Access, 2021 - ieeexplore.ieee.org
Artificial Intelligence techniques powered by deep neural nets have achieved much success
in several application domains, most significantly and notably in the Computer Vision …

A deep learning-based radar and camera sensor fusion architecture for object detection

F Nobis, M Geisslinger, M Weber, J Betz… - 2019 Sensor Data …, 2019 - ieeexplore.ieee.org
Object detection in camera images, using deep learning has been proven successfully in
recent years. Rising detection rates and computationally efficient network structures are …

Modality competition: What makes joint training of multi-modal network fail in deep learning?(provably)

Y Huang, J Lin, C Zhou, H Yang… - … conference on machine …, 2022 - proceedings.mlr.press
Despite the remarkable success of deep multi-modal learning in practice, it has not been
well-explained in theory. Recently, it has been observed that the best uni-modal network …

Characterizing and overcoming the greedy nature of learning in multi-modal deep neural networks

N Wu, S Jastrzebski, K Cho… - … Conference on Machine …, 2022 - proceedings.mlr.press
We hypothesize that due to the greedy nature of learning in multi-modal deep neural
networks, these models tend to rely on just one modality while under-fitting the other …

Deep multimodal fusion for semantic image segmentation: A survey

Y Zhang, D Sidibé, O Morel, F Mériaudeau - Image and Vision Computing, 2021 - Elsevier
Recent advances in deep learning have shown excellent performance in various scene
understanding tasks. However, in some complex environments or under challenging …

Rice-fusion: A multimodality data fusion framework for rice disease diagnosis

RR Patil, S Kumar - IEEE access, 2022 - ieeexplore.ieee.org
Rice leaf infections are a common hazard to rice production, affecting many farmers all over
the world. Early detection and treatment of rice leaf infection are critical for promoting healthy …

Graph neural networks in IoT: A survey

G Dong, M Tang, Z Wang, J Gao, S Guo, L Cai… - ACM Transactions on …, 2023 - dl.acm.org
The Internet of Things (IoT) boom has revolutionized almost every corner of people's daily
lives: healthcare, environment, transportation, manufacturing, supply chain, and so on. With …

M3er: Multiplicative multimodal emotion recognition using facial, textual, and speech cues

T Mittal, U Bhattacharya, R Chandra, A Bera… - Proceedings of the AAAI …, 2020 - aaai.org
We present M3ER, a learning-based method for emotion recognition from multiple input
modalities. Our approach combines cues from multiple co-occurring modalities (such as …

Deep learning on multi sensor data for counter UAV applications—A systematic review

S Samaras, E Diamantidou, D Ataloglou, N Sakellariou… - Sensors, 2019 - mdpi.com
Usage of Unmanned Aerial Vehicles (UAVs) is growing rapidly in a wide range of consumer
applications, as they prove to be both autonomous and flexible in a variety of environments …