Z Wang, L Li, Z Xie, C Liu - Computer Vision and Image Understanding, 2024 - Elsevier
Procedural text generation from visual observation of instructional videos, such as
assembling, biochemical experiments, and cooking, is an essential task for scene …