- 学术资源搜索

Comparative analysis on cross-modal information retrieval: A review

P Kaur, HS Pannu, AK Malhi - Computer Science Review, 2021 - Elsevier

Human beings experience life through a spectrum of modes such as vision, taste, hearing,
smell, and touch. These multiple modes are integrated for information processing in our …

被引用次数：112 相关文章所有 3 个版本

[PDF] arxiv.org

Multimodal conversational ai: A survey of datasets and approaches

A Sundar, L Heck - arXiv preprint arXiv:2205.06907, 2022 - arxiv.org

As humans, we experience the world with all our senses or modalities (sound, sight, touch,
smell, and taste). We use these modalities, particularly sight and touch, to convey and …

被引用次数：40 相关文章所有 6 个版本

[PDF] researchgate.net

STAR++: Rethinking spatio-temporal cross attention transformer for video action recognition

D Ahn, S Kim, BC Ko - Applied Intelligence, 2023 - Springer

Video action recognition needs to model any differences by subdividing the spatio-temporal
features to distinguish various actions. We propose rethinking spatio-temporal cross …

被引用次数：8 相关文章所有 3 个版本

[PDF] thecvf.com

Multi-modal multi-action video recognition

Z Shi, J Liang, Q Li, H Zheng, Z Gu… - Proceedings of the …, 2021 - openaccess.thecvf.com

Multi-action video recognition is much more challenging due to the requirement to recognize
multiple actions co-occurring simultaneously or sequentially. Modeling multi-action relations …

被引用次数：11 相关文章所有 3 个版本

[PDF] ieee.org

Semantic image collection summarization with frequent subgraph mining

A Pasini, F Giobergia, E Pastor, E Baralis - IEEE Access, 2022 - ieeexplore.ieee.org

Applications such as providing a preview of personal albums (eg, Google Photos) or
suggesting thematic collections based on user interests (eg, Pinterest) require a …

被引用次数：9 相关文章所有 4 个版本

[PDF] arxiv.org

Disentangled action recognition with knowledge bases

Z Luo, S Ghosh, D Guillory, K Kato, T Darrell… - arXiv preprint arXiv …, 2022 - arxiv.org

Action in video usually involves the interaction of human with objects. Action labels are
typically composed of various combinations of verbs and nouns, but we may not have …

被引用次数：6 相关文章所有 5 个版本

Sentiment analysis of linguistic cues to assist medical image classification

P Kaur, AK Malhi, HS Pannu - Multimedia Tools and Applications, 2024 - Springer

Image classification is a challenging problem and often suffers from the bottleneck of visual
features. With the ever-growing availability of multimedia data with the help of the Internet …

被引用次数：3 相关文章

[PDF] arxiv.org

Advancing Perception in Artificial Intelligence through Principles of Cognitive Science

P Agrawal, C Tan, H Rathore - arXiv preprint arXiv:2310.08803, 2023 - arxiv.org

Although artificial intelligence (AI) has achieved many feats at a rapid pace, there still exist
open problems and fundamental shortcomings related to performance and resource …

被引用次数：2 相关文章所有 2 个版本

[HTML] frontiersin.org

[HTML][HTML] DPED: Bio-inspired dual-pathway network for edge detection

Y Chen, C Lin, Y Qiao - Frontiers in Bioengineering and …, 2022 - frontiersin.org

As the basis of high-level visual tasks, edge detection is significant. Most of the encoder-
decoder edge detection methods used convolutional neural networks, such as VGG16 or …

被引用次数：7 相关文章所有 6 个版本

Taohighlight: Commodity-aware multi-modal video highlight detection in e-commerce

Z Guo, Z Zhao, W Jin, D Wang, R Liu… - IEEE Transactions on …, 2021 - ieeexplore.ieee.org

In e-commerce, product related video is important content to introduce product
characteristics and attract consumers. Especially in the recommendation system of e …

被引用次数：11 相关文章所有 2 个版本

高级搜索

QQ 群

Comparative analysis on cross-modal information retrieval: A review

Multimodal conversational ai: A survey of datasets and approaches

STAR++: Rethinking spatio-temporal cross attention transformer for video action recognition

Multi-modal multi-action video recognition

Semantic image collection summarization with frequent subgraph mining

Disentangled action recognition with knowledge bases

Sentiment analysis of linguistic cues to assist medical image classification

Advancing Perception in Artificial Intelligence through Principles of Cognitive Science

[HTML][HTML] DPED: Bio-inspired dual-pathway network for edge detection

Taohighlight: Commodity-aware multi-modal video highlight detection in e-commerce

引用