Semantic tracklets: An object-centric representation for visual multi-agent reinforcement learning

J Wasserman, K Yadav, G Chowdhary… - … on Robot Learning, 2023 - proceedings.mlr.press

Realistic long-horizon tasks like image-goal navigation involve exploratory and exploitative
phases. Assigned with an image of the goal, an embodied agent must explore to discover …

被引用次数：35 相关文章所有 6 个版本

[PDF] thecvf.com

The surprising effectiveness of visual odometry techniques for embodied pointgoal navigation

X Zhao, H Agrawal, D Batra… - Proceedings of the …, 2021 - openaccess.thecvf.com

It is fundamental for personal robots to reliably navigate to a specified goal. To study this
task, PointGoal navigation has been introduced in simulated Embodied AI environments …

被引用次数：49 相关文章所有 7 个版本

[PDF] thecvf.com

Slot-vps: Object-centric representation learning for video panoptic segmentation

Y Zhou, H Zhang, H Lee, S Sun, P Li… - Proceedings of the …, 2022 - openaccess.thecvf.com

Abstract Video Panoptic Segmentation (VPS) aims at assigning a class label to each pixel,
uniquely segmenting and identifying all object instances consistently across all frames …

被引用次数：30 相关文章所有 6 个版本

[PDF] neurips.cc

Truly scale-equivariant deep nets with Fourier layers

MA Rahman, RA Yeh - Advances in Neural Information …, 2023 - proceedings.neurips.cc

In computer vision, models must be able to adapt to changes in image resolution to
effectively carry out tasks such as image segmentation; This is known as scale-equivariance …

被引用次数：6 相关文章所有 5 个版本

[PDF] neurips.cc

Learnable polyphase sampling for shift invariant and equivariant convolutional networks

RA Rojas-Gomez, TY Lim, A Schwing… - Advances in Neural …, 2022 - proceedings.neurips.cc

We propose learnable polyphase sampling (LPS), a pair of learnable down/upsampling
layers that enable truly shift-invariant and equivariant convolutional networks. LPS can be …

被引用次数：8 相关文章所有 8 个版本

[PDF] thecvf.com

Gridtopix: Training embodied agents with minimal supervision

U Jain, IJ Liu, S Lazebnik… - Proceedings of the …, 2021 - openaccess.thecvf.com

While deep reinforcement learning (RL) promises freedom from hand-labeled data, great
successes, especially for Embodied AI, require significant work to create supervision via …

被引用次数：24 相关文章所有 5 个版本

[PDF] thecvf.com

Making vision transformers truly shift-equivariant

RA Rojas-Gomez, TY Lim, MN Do… - Proceedings of the …, 2024 - openaccess.thecvf.com

In the field of computer vision Vision Transformers (ViTs) have emerged as a prominent
deep learning architecture. Despite being inspired by Convolutional Neural Networks …

被引用次数：4 相关文章所有 3 个版本

[PDF] arxiv.org

Adaptive action supervision in reinforcement learning from real-world multi-agent demonstrations

K Fujii, K Tsutsui, A Scott, H Nakahara… - arXiv preprint arXiv …, 2023 - arxiv.org

Modeling of real-world biological multi-agents is a fundamental problem in various scientific
and engineering fields. Reinforcement learning (RL) is a powerful framework to generate …

被引用次数：6 相关文章所有 2 个版本

[PDF] arxiv.org

Ocatari: Object-centric atari 2600 reinforcement learning environments

Q Delfosse, J Blüml, B Gregori, S Sztwiertnia… - arXiv preprint arXiv …, 2023 - arxiv.org

Cognitive science and psychology suggest that object-centric representations of complex
scenes are a promising step towards enabling efficient abstract reasoning from low-level …

被引用次数：11 相关文章所有 4 个版本

Object-centric Representation Learning for Video Scene Understanding

Y Zhou, H Zhang, SI Park, BI Yoo… - IEEE Transactions on …, 2024 - ieeexplore.ieee.org

Depth-aware Video Panoptic Segmentation (DVPS) is a challenging task that requires
predicting the semantic class and 3D depth of each pixel in a video, while also segmenting …

高级搜索

QQ 群