Benchmarks for automated commonsense reasoning: A survey

E Davis - ACM Computing Surveys, 2023 - dl.acm.org
More than one hundred benchmarks have been developed to test the commonsense
knowledge and commonsense reasoning abilities of artificial intelligence (AI) systems …

Capturing the objects of vision with neural networks

B Peters, N Kriegeskorte - Nature human behaviour, 2021 - nature.com
Human visual perception carves a scene at its physical joints, decomposing the world into
objects, which are selectively attended, tracked and predicted as we engage our …

Revisiting the" video" in video-language understanding

S Buch, C Eyzaguirre, A Gaidon, J Wu… - Proceedings of the …, 2022 - openaccess.thecvf.com
What makes a video task uniquely suited for videos, beyond what can be understood from a
single image? Building on recent progress in self-supervised image-language models, we …

Savi++: Towards end-to-end object-centric learning from real-world videos

G Elsayed, A Mahendran… - Advances in …, 2022 - proceedings.neurips.cc
The visual world can be parsimoniously characterized in terms of distinct entities with sparse
interactions. Discovering this compositional structure in dynamic visual scenes has proven …

Intuitive physics learning in a deep-learning model inspired by developmental psychology

LS Piloto, A Weinstein, P Battaglia… - Nature human …, 2022 - nature.com
Abstract 'Intuitive physics' enables our pragmatic engagement with the physical world and
forms a key component of 'common sense'aspects of thought. Current artificial intelligence …

Anticipative video transformer

R Girdhar, K Grauman - Proceedings of the IEEE/CVF …, 2021 - openaccess.thecvf.com
Abstract We propose Anticipative Video Transformer (AVT), an end-to-end attention-based
video modeling architecture that attends to the previously observed video in order to …

Conditional object-centric learning from video

T Kipf, GF Elsayed, A Mahendran, A Stone… - arXiv preprint arXiv …, 2021 - arxiv.org
Object-centric representations are a promising path toward more systematic generalization
by providing flexible abstractions upon which compositional world models can be built …

Simple unsupervised object-centric learning for complex and naturalistic videos

G Singh, YF Wu, S Ahn - Advances in Neural Information …, 2022 - proceedings.neurips.cc
Unsupervised object-centric learning aims to represent the modular, compositional, and
causal structure of a scene as a set of object representations and thereby promises to …

Extendable multiple nodes recurrent tracking framework with RTU++

S Wang, H Sheng, D Yang, Y Zhang… - IEEE Transactions on …, 2022 - ieeexplore.ieee.org
Recently, tracking-by-detection has become a popular paradigm in Multiple-object tracking
(MOT) for its concise pipeline. Many current works first associate the detections to form track …

Star: A benchmark for situated reasoning in real-world videos

B Wu, S Yu, Z Chen, JB Tenenbaum, C Gan - arXiv preprint arXiv …, 2024 - arxiv.org
Reasoning in the real world is not divorced from situations. How to capture the present
knowledge from surrounding situations and perform reasoning accordingly is crucial and …