Deep high-resolution representation learning for visual recognition

J Wang, K Sun, T Cheng, B Jiang… - IEEE transactions on …, 2020 - ieeexplore.ieee.org
representations are essential for position-sensitive visionrepresentation through a
subnetwork that is formed by … high-resolution representation from the encoded low-resolution …

Towards universal representation learning for deep face recognition

Y Shi, X Yu, K Sohn, M Chandraker… - … pattern recognition, 2020 - openaccess.thecvf.com
… Instead, we propose a universal representation learning face recognition framework, URFace,
that can deal with larger variations unseen in the given training data, without leveraging …

Visual transformers: Token-based image representation and processing for computer vision

B Wu, C Xu, X Dai, A Wan, P Zhang, Z Yan… - arXiv preprint arXiv …, 2020 - arxiv.org
Computer vision has achieved remarkable success by (a) representing images as uniformly-arranged
pixel arrays and (b) convolving highly-localized features. However, convolutions …

Volo: Vision outlooker for visual recognition

L Yuan, Q Hou, Z Jiang, J Feng… - IEEE transactions on …, 2022 - ieeexplore.ieee.org
… -to-token representation learning first proposed in our conference version with outlook
attention and presented a new model, Vision Outlooker (VOLO), for solving computer vision tasks. …

Seeing out of the box: End-to-end pre-training for vision-language representation learning

Z Huang, Z Zeng, Y Huang, B Liu… - … pattern recognition, 2021 - openaccess.thecvf.com
… visual representationsvisionlanguage tasks [17] or vision recognition tasks [9, 32]. Our work
shares a similar format of visual representation with [17] while we focus on the area of vision-…

Vision mamba: Efficient visual representation learning with bidirectional state space model

L Zhu, B Liao, Q Zhang, X Wang, W Liu… - arXiv preprint arXiv …, 2024 - arxiv.org
… the success of Mamba to vision, ie, building a generic vision backbone purely upon … Inspired
by ViT [14] and BERT [31], we also use class token to represent the whole patch sequence, …

Learning semantic-specific graph representation for multi-label image recognition

T Chen, M Xu, X Hui, H Wu… - … on computer vision, 2019 - openaccess.thecvf.com
… To address these issues, we propose a Semantic-Specific Graph Representation Learning
(… representations and 2) a semantic interaction module that correlates these representations

[HTML][HTML] A comprehensive survey of vision-based human action recognition methods

HB Zhang, YX Zhang, B Zhong, Q Lei, L Yang, JX Du… - Sensors, 2019 - mdpi.com
… Feature representation and selection is a classic problem in computer vision and machine
learning [8]. Unlike feature representation in an image space, the feature representation of …

Gloria: A multimodal global-local representation learning framework for label-efficient medical image recognition

SC Huang, L Shen, MP Lungren… - … on Computer Vision, 2021 - openaccess.thecvf.com
… label-efficient multimodal medical imaging representations by leveraging radiology reports.
… the learned representations for various downstream medical image recognition tasks with …

12-in-1: Multi-task vision and language representation learning

J Lu, V Goswami, M Rohrbach… - … pattern recognition, 2020 - openaccess.thecvf.com
Much of vision-and-language research focuses on a small but diverse set of independent
tasks and supporting datasets often studied in isolation; however, the visually-grounded …