A Sundar, L Heck - arXiv preprint arXiv:2205.06907, 2022 - arxiv.org
As humans, we experience the world with all our senses or modalities (sound, sight, touch, smell, and taste). We use these modalities, particularly sight and touch, to convey and …
Video action recognition needs to model any differences by subdividing the spatio-temporal features to distinguish various actions. We propose rethinking spatio-temporal cross …
Z Shi, J Liang, Q Li, H Zheng, Z Gu… - Proceedings of the …, 2021 - openaccess.thecvf.com
Multi-action video recognition is much more challenging due to the requirement to recognize multiple actions co-occurring simultaneously or sequentially. Modeling multi-action relations …
Applications such as providing a preview of personal albums (eg, Google Photos) or suggesting thematic collections based on user interests (eg, Pinterest) require a …
Action in video usually involves the interaction of human with objects. Action labels are typically composed of various combinations of verbs and nouns, but we may not have …
Image classification is a challenging problem and often suffers from the bottleneck of visual features. With the ever-growing availability of multimedia data with the help of the Internet …
Although artificial intelligence (AI) has achieved many feats at a rapid pace, there still exist open problems and fundamental shortcomings related to performance and resource …
Y Chen, C Lin, Y Qiao - Frontiers in Bioengineering and …, 2022 - frontiersin.org
As the basis of high-level visual tasks, edge detection is significant. Most of the encoder- decoder edge detection methods used convolutional neural networks, such as VGG16 or …
Z Guo, Z Zhao, W Jin, D Wang, R Liu… - IEEE Transactions on …, 2021 - ieeexplore.ieee.org
In e-commerce, product related video is important content to introduce product characteristics and attract consumers. Especially in the recommendation system of e …