With the recent advancement in large language models (LLMs), there is a growing interest in combining LLMs with multimodal learning. Previous surveys of multimodal large language …
Image editing aims to edit the given synthetic or real image to meet the specific requirements from users. It is widely studied in recent years as a promising and challenging field of …
R Sajnani, J Vanbaar, J Min, K Katyal… - arXiv preprint arXiv …, 2024 - arxiv.org
The success of image generative models has enabled us to build methods that can edit images based on text or other user input. However, these methods are bespoke, imprecise …
C Min, S Sridhar - arXiv preprint arXiv:2406.05059, 2024 - arxiv.org
Grasping is an important human activity that has long been studied in robotics, computer vision, and cognitive science. Most existing works study grasping from the perspective of …
J Lu, X Li, K Han - arXiv preprint arXiv:2407.18247, 2024 - arxiv.org
Point-drag-based image editing methods, like DragDiffusion, have attracted significant attention. However, point-drag-based approaches suffer from computational overhead and …
X Cui, P Li, Z Li, X Liu, Y Zou, Z He - arXiv preprint arXiv:2406.00432, 2024 - arxiv.org
Flexible and accurate drag-based editing is a challenging task that has recently garnered significant attention. Current methods typically model this problem as automatically …
Point-based image editing enables accurate and flexible control through content dragging. However, the role of text embedding in the editing process has not been thoroughly …
Drag-based image editing using generative models provides precise control over image contents, enabling users to manipulate anything in an image with a few clicks. However …