Causal inference is a critical research topic across many domains, such as statistics, computer science, education, public policy, and economics, for decades. Nowadays …
The field of graph neural networks (GNNs) has seen rapid and incredible strides over the recent years. Graph neural networks, also known as deep learning on graphs, graph …
Recent methods for visual question answering rely on large-scale annotated datasets. Manual annotation of questions and answers for videos, however, is tedious, expensive and …
H Tan, M Bansal - arXiv preprint arXiv:1908.07490, 2019 - arxiv.org
Vision-and-language reasoning requires an understanding of visual concepts, language semantics, and, most importantly, the alignment and relationships between these two …
We present VILLA, the first known effort on large-scale adversarial training for vision-and- language (V+ L) representation learning. VILLA consists of two training stages:(i) task …
L Kong, C Lian, D Huang, Y Hu… - Advances in Neural …, 2021 - proceedings.neurips.cc
Abstract Supervised Pix2Pix and unsupervised Cycle-consistency are two modes that dominate the field of medical image-to-image translation. However, neither modes are ideal …
L Chen, X Yan, J Xiao, H Zhang… - Proceedings of the …, 2020 - openaccess.thecvf.com
Abstract Despite Visual Question Answering (VQA) has realized impressive progress over the last few years, today's VQA models tend to capture superficial linguistic correlations in …
Deep learning methods haverevolutionized speech recognition, image recognition, and natural language processing since 2010. Each of these tasks involves a single modality in …
Many real-world video-text tasks involve different levels of granularity, such as frames and words, clip and sentences or videos and paragraphs, each with distinct semantics. In this …