Scanrefer: 3d object localization in rgb-d scans using natural language

DZ Chen, AX Chang, M Nießner - European conference on computer …, 2020 - Springer
We introduce the task of 3D object localization in RGB-D scans using natural language
descriptions. As input, we assume a point cloud of a scanned 3D scene along with a free …

Rsvg: Exploring data and models for visual grounding on remote sensing data

Y Zhan, Z Xiong, Y Yuan - IEEE Transactions on Geoscience …, 2023 - ieeexplore.ieee.org
In this article, we introduce the task of visual grounding for remote sensing data (RSVG).
RSVG aims to localize the referred objects in remote sensing (RS) images with the guidance …

Coarse-to-fine reasoning for visual question answering

BX Nguyen, T Do, H Tran, E Tjiputra… - Proceedings of the …, 2022 - openaccess.thecvf.com
Bridging the semantic gap between image and question is an important step to improve the
accuracy of the Visual Question Answering (VQA) task. However, most of the existing VQA …

Graph-based person signature for person re-identifications

BX Nguyen, BD Nguyen, T Do… - Proceedings of the …, 2021 - openaccess.thecvf.com
The task of person re-identification (ReID) is to match images of the same person over
multiple non-overlapping camera views. Due to the variations in visual factors, previous …

Real-time 6dof pose relocalization for event cameras with stacked spatial lstm networks

A Nguyen, TT Do, DG Caldwell… - Proceedings of the …, 2019 - openaccess.thecvf.com
We present a new method to relocalize the 6DOF pose of an event camera solely based on
the event stream. Our method first creates the event image from a list of events that occurs in …

A joint network for grasp detection conditioned on natural language commands

Y Chen, R Xu, Y Lin, PA Vela - 2021 IEEE International …, 2021 - ieeexplore.ieee.org
We consider the task of grasping a target object based on a natural language command
query. Previous work primarily focused on localizing the object given the query, which …

Light-weight deformable registration using adversarial learning with distilling knowledge

MQ Tran, T Do, H Tran, E Tjiputra… - IEEE transactions on …, 2022 - ieeexplore.ieee.org
Deformable registration is a crucial step in many medical procedures such as image-guided
surgery and radiation therapy. Most recent learning-based methods focus on improving the …

Autonomous navigation in complex environments with deep multimodal fusion network

A Nguyen, N Nguyen, K Tran… - 2020 IEEE/RSJ …, 2020 - ieeexplore.ieee.org
Autonomous navigation in complex environments is a crucial task in time-sensitive
scenarios such as disaster response or search and rescue. However, complex environments …

Exploration of Cross‐Modal Text Generation Methods in Smart Justice

Y Zhang - Scientific Programming, 2021 - Wiley Online Library
With the development of modern science and technology, information technology has
brought great changes to many fields. Smart justice has become one of the increasing areas …

Language conditioned multi-scale visual attention networks for visual grounding

H Yao, L Wang, C Cai, W Wang, Z Zhang… - Image and Vision …, 2024 - Elsevier
Visual grounding (VG) is a task that requires to locate a specific region in an image
according to a natural language expression. Existing efforts on the VG task are divided into …