Comparative analysis on cross-modal information retrieval: A review

P Kaur, HS Pannu, AK Malhi - Computer Science Review, 2021 - Elsevier
Human beings experience life through a spectrum of modes such as vision, taste, hearing,
smell, and touch. These multiple modes are integrated for information processing in our …

Multimodal conversational ai: A survey of datasets and approaches

A Sundar, L Heck - arXiv preprint arXiv:2205.06907, 2022 - arxiv.org
As humans, we experience the world with all our senses or modalities (sound, sight, touch,
smell, and taste). We use these modalities, particularly sight and touch, to convey and …

STAR++: Rethinking spatio-temporal cross attention transformer for video action recognition

D Ahn, S Kim, BC Ko - Applied Intelligence, 2023 - Springer
Video action recognition needs to model any differences by subdividing the spatio-temporal
features to distinguish various actions. We propose rethinking spatio-temporal cross …

Multi-modal multi-action video recognition

Z Shi, J Liang, Q Li, H Zheng, Z Gu… - Proceedings of the …, 2021 - openaccess.thecvf.com
Multi-action video recognition is much more challenging due to the requirement to recognize
multiple actions co-occurring simultaneously or sequentially. Modeling multi-action relations …

Semantic image collection summarization with frequent subgraph mining

A Pasini, F Giobergia, E Pastor, E Baralis - IEEE Access, 2022 - ieeexplore.ieee.org
Applications such as providing a preview of personal albums (eg, Google Photos) or
suggesting thematic collections based on user interests (eg, Pinterest) require a …

Disentangled action recognition with knowledge bases

Z Luo, S Ghosh, D Guillory, K Kato, T Darrell… - arXiv preprint arXiv …, 2022 - arxiv.org
Action in video usually involves the interaction of human with objects. Action labels are
typically composed of various combinations of verbs and nouns, but we may not have …

Sentiment analysis of linguistic cues to assist medical image classification

P Kaur, AK Malhi, HS Pannu - Multimedia Tools and Applications, 2024 - Springer
Image classification is a challenging problem and often suffers from the bottleneck of visual
features. With the ever-growing availability of multimedia data with the help of the Internet …

Advancing Perception in Artificial Intelligence through Principles of Cognitive Science

P Agrawal, C Tan, H Rathore - arXiv preprint arXiv:2310.08803, 2023 - arxiv.org
Although artificial intelligence (AI) has achieved many feats at a rapid pace, there still exist
open problems and fundamental shortcomings related to performance and resource …

[HTML][HTML] DPED: Bio-inspired dual-pathway network for edge detection

Y Chen, C Lin, Y Qiao - Frontiers in Bioengineering and …, 2022 - frontiersin.org
As the basis of high-level visual tasks, edge detection is significant. Most of the encoder-
decoder edge detection methods used convolutional neural networks, such as VGG16 or …

Taohighlight: Commodity-aware multi-modal video highlight detection in e-commerce

Z Guo, Z Zhao, W Jin, D Wang, R Liu… - IEEE Transactions on …, 2021 - ieeexplore.ieee.org
In e-commerce, product related video is important content to introduce product
characteristics and attract consumers. Especially in the recommendation system of e …