An overview of cross-media retrieval: Concepts, methodologies, benchmarks, and challenges

Y Peng, X Huang, Y Zhao - … on circuits and systems for video …, 2017 - ieeexplore.ieee.org
Multimedia retrieval plays an indispensable role in big data utilization. Past efforts mainly
focused on single-media retrieval. However, the requirements of users are highly flexible …

Comparative analysis on cross-modal information retrieval: A review

P Kaur, HS Pannu, AK Malhi - Computer Science Review, 2021 - Elsevier
Human beings experience life through a spectrum of modes such as vision, taste, hearing,
smell, and touch. These multiple modes are integrated for information processing in our …

Rescaling egocentric vision: Collection, pipeline and challenges for epic-kitchens-100

D Damen, H Doughty, GM Farinella, A Furnari… - International Journal of …, 2022 - Springer
This paper introduces the pipeline to extend the largest dataset in egocentric vision, EPIC-
KITCHENS. The effort culminates in EPIC-KITCHENS-100, a collection of 100 hours, 20M …

Learning discriminative binary codes for large-scale cross-modal retrieval

X Xu, F Shen, Y Yang, HT Shen… - IEEE Transactions on …, 2017 - ieeexplore.ieee.org
Hashing based methods have attracted considerable attention for efficient cross-modal
retrieval on large-scale multimedia data. The core problem of cross-modal hashing is how to …

Learning a joint affinity graph for multiview subspace clustering

C Tang, X Zhu, X Liu, M Li, P Wang… - IEEE Transactions …, 2018 - ieeexplore.ieee.org
With the ability to exploit the internal structure of data, graph-based models have received a
lot of attention and have achieved great success in multiview subspace clustering for …

Exploiting subspace relation in semantic labels for cross-modal hashing

HT Shen, L Liu, Y Yang, X Xu, Z Huang… - … on Knowledge and …, 2020 - ieeexplore.ieee.org
Hashing methods have been extensively applied to efficient multimedia data indexing and
retrieval on account of the explosion of multimedia data. Cross-modal hashing usually …

CM-GANs: Cross-modal generative adversarial networks for common representation learning

Y Peng, J Qi - ACM Transactions on Multimedia Computing …, 2019 - dl.acm.org
It is known that the inconsistent distributions and representations of different modalities, such
as image and text, cause the heterogeneity gap, which makes it very challenging to correlate …

Deep multi-view subspace clustering with unified and discriminative learning

Q Wang, J Cheng, Q Gao, G Zhao… - IEEE Transactions on …, 2020 - ieeexplore.ieee.org
Deep multi-view subspace clustering has achieved promising performance compared with
other multi-view clustering. However, existing deep multi-view subspace clustering only …

Know more say less: Image captioning based on scene graphs

X Li, S Jiang - IEEE Transactions on Multimedia, 2019 - ieeexplore.ieee.org
Automatically describing the content of an image has been attracting considerable research
attention in the multimedia field. To represent the content of an image, many approaches …

Unsupervised person re-identification by deep asymmetric metric embedding

HX Yu, A Wu, WS Zheng - IEEE transactions on pattern …, 2018 - ieeexplore.ieee.org
Person re-identification (Re-ID) aims to match identities across non-overlapping camera
views. Researchers have proposed many supervised Re-ID models which require quantities …