Last-mile embodied visual navigation

J Wasserman, K Yadav, G Chowdhary… - … on Robot Learning, 2023 - proceedings.mlr.press
Realistic long-horizon tasks like image-goal navigation involve exploratory and exploitative
phases. Assigned with an image of the goal, an embodied agent must explore to discover …

The surprising effectiveness of visual odometry techniques for embodied pointgoal navigation

X Zhao, H Agrawal, D Batra… - Proceedings of the …, 2021 - openaccess.thecvf.com
It is fundamental for personal robots to reliably navigate to a specified goal. To study this
task, PointGoal navigation has been introduced in simulated Embodied AI environments …

Slot-vps: Object-centric representation learning for video panoptic segmentation

Y Zhou, H Zhang, H Lee, S Sun, P Li… - Proceedings of the …, 2022 - openaccess.thecvf.com
Abstract Video Panoptic Segmentation (VPS) aims at assigning a class label to each pixel,
uniquely segmenting and identifying all object instances consistently across all frames …

Truly scale-equivariant deep nets with Fourier layers

MA Rahman, RA Yeh - Advances in Neural Information …, 2023 - proceedings.neurips.cc
In computer vision, models must be able to adapt to changes in image resolution to
effectively carry out tasks such as image segmentation; This is known as scale-equivariance …

Learnable polyphase sampling for shift invariant and equivariant convolutional networks

RA Rojas-Gomez, TY Lim, A Schwing… - Advances in Neural …, 2022 - proceedings.neurips.cc
We propose learnable polyphase sampling (LPS), a pair of learnable down/upsampling
layers that enable truly shift-invariant and equivariant convolutional networks. LPS can be …

Gridtopix: Training embodied agents with minimal supervision

U Jain, IJ Liu, S Lazebnik… - Proceedings of the …, 2021 - openaccess.thecvf.com
While deep reinforcement learning (RL) promises freedom from hand-labeled data, great
successes, especially for Embodied AI, require significant work to create supervision via …

Making vision transformers truly shift-equivariant

RA Rojas-Gomez, TY Lim, MN Do… - Proceedings of the …, 2024 - openaccess.thecvf.com
In the field of computer vision Vision Transformers (ViTs) have emerged as a prominent
deep learning architecture. Despite being inspired by Convolutional Neural Networks …

Adaptive action supervision in reinforcement learning from real-world multi-agent demonstrations

K Fujii, K Tsutsui, A Scott, H Nakahara… - arXiv preprint arXiv …, 2023 - arxiv.org
Modeling of real-world biological multi-agents is a fundamental problem in various scientific
and engineering fields. Reinforcement learning (RL) is a powerful framework to generate …

Ocatari: Object-centric atari 2600 reinforcement learning environments

Q Delfosse, J Blüml, B Gregori, S Sztwiertnia… - arXiv preprint arXiv …, 2023 - arxiv.org
Cognitive science and psychology suggest that object-centric representations of complex
scenes are a promising step towards enabling efficient abstract reasoning from low-level …

Object-centric Representation Learning for Video Scene Understanding

Y Zhou, H Zhang, SI Park, BI Yoo… - IEEE Transactions on …, 2024 - ieeexplore.ieee.org
Depth-aware Video Panoptic Segmentation (DVPS) is a challenging task that requires
predicting the semantic class and 3D depth of each pixel in a video, while also segmenting …