Visual grounding of learned physical models

A Kadambi, C de Melo, CJ Hsieh… - Nature Machine …, 2023 - nature.com

Many computer vision techniques infer properties of our physical world from images.
Although images are formed through the physics of light and mechanics, computer vision …

被引用次数：23 相关文章所有 4 个版本

[PDF] arxiv.org

On the opportunities and risks of foundation models

R Bommasani, DA Hudson, E Adeli, R Altman… - arXiv preprint arXiv …, 2021 - arxiv.org

AI is undergoing a paradigm shift with the rise of models (eg, BERT, DALL-E, GPT-3) that are
trained on broad data at scale and are adaptable to a wide range of downstream tasks. We …

被引用次数：4550 相关文章所有 2 个版本

[PDF] arxiv.org

Physically grounded vision-language models for robotic manipulation

J Gao, B Sarkar, F Xia, T Xiao, J Wu… - … on Robotics and …, 2024 - ieeexplore.ieee.org

Recent advances in vision-language models (VLMs) have led to improved performance on
tasks such as visual question answering and image captioning. Consequently, these models …

被引用次数：86 相关文章所有 2 个版本

[PDF] arxiv.org

On the binding problem in artificial neural networks

K Greff, S Van Steenkiste, J Schmidhuber - arXiv preprint arXiv …, 2020 - arxiv.org

Contemporary neural networks still fall short of human-level generalization, which extends
far beyond our direct experiences. In this paper, we argue that the underlying cause for this …

被引用次数：296 相关文章所有 2 个版本

[PDF] mlr.press

3d neural scene representations for visuomotor control

Y Li, S Li, V Sitzmann, P Agrawal… - Conference on Robot …, 2022 - proceedings.mlr.press

Humans have a strong intuitive understanding of the 3D environment around us. The mental
model of the physics in our brain applies to objects of different materials and enables us to …

被引用次数：142 相关文章所有 7 个版本

[PDF] arxiv.org

Physgen: Rigid-body physics-grounded image-to-video generation

S Liu, Z Ren, S Gupta, S Wang - European Conference on Computer …, 2025 - Springer

We present PhysGen, a novel image-to-video generation method that converts a single
image and an input condition (eg., force and torque applied to an object in the image) to …

被引用次数：14 相关文章所有 9 个版本

[PDF] mlr.press

Learning multi-object dynamics with compositional neural radiance fields

D Driess, Z Huang, Y Li, R Tedrake… - Conference on robot …, 2023 - proceedings.mlr.press

We present a method to learn compositional multi-object dynamics models from image
observations based on implicit object encoders, Neural Radiance Fields (NeRFs), and …

被引用次数：88 相关文章所有 9 个版本

[PDF] google.com

Algorithms and systems for manipulating multiple objects

Z Pan, A Zeng, Y Li, J Yu… - IEEE Transactions on …, 2022 - ieeexplore.ieee.org

Robot manipulation of multiple objects is an important topic for applications including
warehouse automation, service robots performing cleaning, and large-scale object sorting …

被引用次数：25 相关文章所有 6 个版本

[PDF] neurips.cc

Dynamic visual reasoning by learning differentiable physics models from video and language

M Ding, Z Chen, T Du, P Luo… - Advances In Neural …, 2021 - proceedings.neurips.cc

In this work, we propose a unified framework, called Visual Reasoning with Differ-entiable
Physics (VRDP), that can jointly learn visual concepts and infer physics models of objects …

被引用次数：76 相关文章所有 8 个版本

[PDF] arxiv.org

RoboCraft: Learning to see, simulate, and shape elasto-plastic objects in 3D with graph networks

H Shi, H Xu, Z Huang, Y Li… - The International Journal …, 2024 - journals.sagepub.com

Modeling and manipulating elasto-plastic objects are essential capabilities for robots to
perform complex industrial and household interaction tasks (eg, stuffing dumplings, rolling …

被引用次数：72 相关文章所有 10 个版本

高级搜索

QQ 群