GameVLM: A Decision-making Framework for Robotic Task Planning Based on Visual Language Models and Zero-sum Games

A Mei, J Wang, GN Zhu, Z Gan - arXiv preprint arXiv:2405.13751, 2024 - arxiv.org
With their prominent scene understanding and reasoning capabilities, pre-trained visual-
language models (VLMs) such as GPT-4V have attracted increasing attention in robotic task …