Self-supervised video pretraining yields robust and more human-aligned visual representations

N Parthasarathy, SM Eslami… - Advances in Neural …, 2023 - proceedings.neurips.cc
Humans learn powerful representations of objects and scenes by observing how they evolve
over time. Yet, outside of specific tasks that require explicit temporal understanding, static …

Mv-jar: Masked voxel jigsaw and reconstruction for lidar-based self-supervised pre-training

R Xu, T Wang, W Zhang, R Chen… - Proceedings of the …, 2023 - openaccess.thecvf.com
This paper introduces the Masked Voxel Jigsaw and Reconstruction (MV-JAR) method for
LiDAR-based self-supervised pre-training and a carefully designed data-efficient 3D object …

Self-supervised remote sensing feature learning: Learning paradigms, challenges, and future works

C Tao, J Qi, M Guo, Q Zhu, H Li - IEEE Transactions on …, 2023 - ieeexplore.ieee.org
Deep learning has achieved great success in learning features from massive remote
sensing images (RSIs). To better understand the connection between three feature learning …

Dual-path adaptation from image to video transformers

J Park, J Lee, K Sohn - … of the IEEE/CVF Conference on …, 2023 - openaccess.thecvf.com
In this paper, we efficiently transfer the surpassing representation power of the vision
foundation models, such as ViT and Swin, for video understanding with only a few trainable …

Recontab: Regularized contrastive representation learning for tabular data

S Chen, J Wu, N Hovakimyan, H Yao - arXiv preprint arXiv:2310.18541, 2023 - arxiv.org
Representation learning stands as one of the critical machine learning techniques across
various domains. Through the acquisition of high-quality features, pre-trained embeddings …

Unlocking the emotional world of visual media: An overview of the science, research, and impact of understanding emotion

JZ Wang, S Zhao, C Wu, RB Adams… - Proceedings of the …, 2023 - ieeexplore.ieee.org
The emergence of artificial emotional intelligence technology is revolutionizing the fields of
computers and robotics, allowing for a new level of communication and understanding of …

Affordance grounding from demonstration video to target image

J Chen, D Gao, KQ Lin… - Proceedings of the IEEE …, 2023 - openaccess.thecvf.com
Humans excel at learning from expert demonstrations and solving their own problems. To
equip intelligent robots and assistants, such as AR glasses, with this ability, it is essential to …

Masked motion encoding for self-supervised video representation learning

X Sun, P Chen, L Chen, C Li, TH Li… - Proceedings of the …, 2023 - openaccess.thecvf.com
How to learn discriminative video representation from unlabeled videos is challenging but
crucial for video analysis. The latest attempts seek to learn a representation model by …

Mae-dfer: Efficient masked autoencoder for self-supervised dynamic facial expression recognition

L Sun, Z Lian, B Liu, J Tao - Proceedings of the 31st ACM International …, 2023 - dl.acm.org
Dynamic facial expression recognition (DFER) is essential to the development of intelligent
and empathetic machines. Prior efforts in this field mainly fall into supervised learning …

Ponderv2: Pave the way for 3d foundataion model with a universal pre-training paradigm

H Zhu, H Yang, X Wu, D Huang, S Zhang, X He… - arXiv preprint arXiv …, 2023 - arxiv.org
In contrast to numerous NLP and 2D computer vision foundational models, the learning of a
robust and highly generalized 3D foundational model poses considerably greater …