Masked autoencoders as spatiotemporal learners

N Parthasarathy, SM Eslami… - Advances in Neural …, 2023 - proceedings.neurips.cc

Humans learn powerful representations of objects and scenes by observing how they evolve
over time. Yet, outside of specific tasks that require explicit temporal understanding, static …

被引用次数：5 相关文章所有 3 个版本

[PDF] thecvf.com

Mv-jar: Masked voxel jigsaw and reconstruction for lidar-based self-supervised pre-training

R Xu, T Wang, W Zhang, R Chen… - Proceedings of the …, 2023 - openaccess.thecvf.com

This paper introduces the Masked Voxel Jigsaw and Reconstruction (MV-JAR) method for
LiDAR-based self-supervised pre-training and a carefully designed data-efficient 3D object …

被引用次数：18 相关文章所有 5 个版本

[PDF] arxiv.org

Self-supervised remote sensing feature learning: Learning paradigms, challenges, and future works

C Tao, J Qi, M Guo, Q Zhu, H Li - IEEE Transactions on …, 2023 - ieeexplore.ieee.org

Deep learning has achieved great success in learning features from massive remote
sensing images (RSIs). To better understand the connection between three feature learning …

被引用次数：35 相关文章所有 3 个版本

[PDF] thecvf.com

Dual-path adaptation from image to video transformers

J Park, J Lee, K Sohn - … of the IEEE/CVF Conference on …, 2023 - openaccess.thecvf.com

In this paper, we efficiently transfer the surpassing representation power of the vision
foundation models, such as ViT and Swin, for video understanding with only a few trainable …

被引用次数：17 相关文章所有 6 个版本

[PDF] arxiv.org

Recontab: Regularized contrastive representation learning for tabular data

S Chen, J Wu, N Hovakimyan, H Yao - arXiv preprint arXiv:2310.18541, 2023 - arxiv.org

Representation learning stands as one of the critical machine learning techniques across
various domains. Through the acquisition of high-quality features, pre-trained embeddings …

被引用次数：24 相关文章所有 4 个版本

[PDF] ieee.org

Unlocking the emotional world of visual media: An overview of the science, research, and impact of understanding emotion

JZ Wang, S Zhao, C Wu, RB Adams… - Proceedings of the …, 2023 - ieeexplore.ieee.org

The emergence of artificial emotional intelligence technology is revolutionizing the fields of
computers and robotics, allowing for a new level of communication and understanding of …

被引用次数：21 相关文章所有 16 个版本

[PDF] thecvf.com

Affordance grounding from demonstration video to target image

J Chen, D Gao, KQ Lin… - Proceedings of the IEEE …, 2023 - openaccess.thecvf.com

Humans excel at learning from expert demonstrations and solving their own problems. To
equip intelligent robots and assistants, such as AR glasses, with this ability, it is essential to …

被引用次数：17 相关文章所有 5 个版本

[PDF] thecvf.com

Masked motion encoding for self-supervised video representation learning

X Sun, P Chen, L Chen, C Li, TH Li… - Proceedings of the …, 2023 - openaccess.thecvf.com

How to learn discriminative video representation from unlabeled videos is challenging but
crucial for video analysis. The latest attempts seek to learn a representation model by …

被引用次数：20 相关文章所有 8 个版本

[PDF] acm.org

Mae-dfer: Efficient masked autoencoder for self-supervised dynamic facial expression recognition

L Sun, Z Lian, B Liu, J Tao - Proceedings of the 31st ACM International …, 2023 - dl.acm.org

Dynamic facial expression recognition (DFER) is essential to the development of intelligent
and empathetic machines. Prior efforts in this field mainly fall into supervised learning …

被引用次数：20 相关文章所有 4 个版本

[PDF] arxiv.org

Ponderv2: Pave the way for 3d foundataion model with a universal pre-training paradigm

H Zhu, H Yang, X Wu, D Huang, S Zhang, X He… - arXiv preprint arXiv …, 2023 - arxiv.org

In contrast to numerous NLP and 2D computer vision foundational models, the learning of a
robust and highly generalized 3D foundational model poses considerably greater …

被引用次数：16 相关文章所有 2 个版本

高级搜索

QQ 群