Ego-exo4d: Understanding skilled human activity from first-and third-person perspectives

K Grauman, A Westbury, L Torresani… - Proceedings of the …, 2024 - openaccess.thecvf.com
Abstract We present Ego-Exo4D a diverse large-scale multimodal multiview video dataset
and benchmark challenge. Ego-Exo4D centers around simultaneously-captured egocentric …

Progress-aware online action segmentation for egocentric procedural task videos

Y Shen, E Elhamifar - … of the IEEE/CVF Conference on …, 2024 - openaccess.thecvf.com
We address the problem of online action segmentation for egocentric procedural task
videos. While previous studies have mostly focused on offline action segmentation where …

Unsupervised task graph generation from instructional video transcripts

L Logeswaran, S Sohn, Y Jang, M Lee… - arXiv preprint arXiv …, 2023 - arxiv.org
This work explores the problem of generating task graphs of real-world activities. Different
from prior formulations, we consider a setting where text transcripts of instructional videos …

Non-Sequential Graph Script Induction via Multimedia Grounding

Y Zhou, S Li, M Li, X Lin, SF Chang, M Bansal… - arXiv preprint arXiv …, 2023 - arxiv.org
Online resources such as WikiHow compile a wide range of scripts for performing everyday
tasks, which can assist models in learning to reason about procedures. However, the scripts …

Code Models are Zero-shot Precondition Reasoners

L Logeswaran, S Sohn, Y Lyu, AZ Liu, DK Kim… - arXiv preprint arXiv …, 2023 - arxiv.org
One of the fundamental skills required for an agent acting in an environment to complete
tasks is the ability to understand what actions are plausible at any given point. This work …

Differentiable Task Graph Learning: Procedural Activity Representation and Online Mistake Detection from Egocentric Videos

L Seminara, GM Farinella, A Furnari - arXiv preprint arXiv:2406.01486, 2024 - arxiv.org
Procedural activities are sequences of key-steps aimed at achieving specific goals. They are
crucial to build intelligent agents able to assist users effectively. In this context, task graphs …

TOD-Flow: Modeling the Structure of Task-Oriented Dialogues

S Sohn, Y Lyu, A Liu, L Logeswaran, DK Kim… - arXiv preprint arXiv …, 2023 - arxiv.org
Task-Oriented Dialogue (TOD) systems have become crucial components in interactive
artificial intelligence applications. While recent advances have capitalized on pre-trained …

Box2Flow: Instance-Based Action Flow Graphs from Videos

J Li, K Basioti, V Pavlovic - International Conference on Pattern …, 2024 - Springer
A large amount of procedural videos on the web show how to complete various tasks. These
tasks can often be accomplished in different ways and step orderings, with some steps able …

Task Graph Based Mistake Detection in Instructional Videos

P Sell - 2024 - etda.libraries.psu.edu
The primary focus of this thesis is to examine the impact of integrating a structural task graph
into a visual recognition network to accurately identify and segment errors in the assembly of …

Event-centric multimodal knowledge acquisition

M Li - 2023 - ideals.illinois.edu
Abstract What happened? Who? When? Where? Why? What will happen next? are the
fundamental questions asked to comprehend the overwhelming amount of information …