Non-salient region object mining for weakly supervised semantic segmentation

Y Yao, T Chen, GS Xie, C Zhang… - Proceedings of the …, 2021 - openaccess.thecvf.com
Semantic segmentation aims to classify every pixel of an input image. Considering the
difficulty of acquiring dense labels, researchers have recently been resorting to weak labels …

Jo-src: A contrastive approach for combating noisy labels

Y Yao, Z Sun, C Zhang, F Shen, Q Wu… - Proceedings of the …, 2021 - openaccess.thecvf.com
Due to the memorization effect in Deep Neural Networks (DNNs), training with noisy labels
usually results in inferior model performance. Existing state-of-the-art methods primarily …

Visual language integration: A survey and open challenges

SM Park, YG Kim - Computer Science Review, 2023 - Elsevier
With the recent development of deep learning technology comes the wide use of artificial
intelligence (AI) models in various domains. AI shows good performance for definite …

Divide-and-conquer predictor for unbiased scene graph generation

X Han, X Dong, X Song, T Gan, Y Zhan… - … on Circuits and …, 2022 - ieeexplore.ieee.org
Scene Graph Generation (SGG) aims to detect the objects and their pairwise predicates in
an image. Existing SGG methods mainly fulfil the challenging predicate prediction task that …

Semantically meaningful class prototype learning for one-shot image segmentation

T Chen, GS Xie, Y Yao, Q Wang, F Shen… - IEEE Transactions …, 2021 - ieeexplore.ieee.org
One-shot semantic image segmentation aims to segment the object regions for the novel
class with only one annotated image. Recent works adopt the episodic training strategy to …

Crssc: salvage reusable samples from noisy data for robust learning

Z Sun, XS Hua, Y Yao, XS Wei, G Hu… - Proceedings of the 28th …, 2020 - dl.acm.org
Due to the existence of label noise in web images and the high memorization capacity of
deep neural networks, training deep fine-grained (FG) models directly through web images …

Dense semantics-assisted networks for video action recognition

H Luo, G Lin, Y Yao, Z Tang, Q Wu… - IEEE Transactions on …, 2021 - ieeexplore.ieee.org
Most existing action recognition approaches directly leverage the video-level features to
recognize human actions from videos. Although these methods have made remarkable …

Exploiting web images for fine-grained visual recognition by eliminating open-set noise and utilizing hard examples

H Liu, C Zhang, Y Yao, XS Wei, F Shen… - IEEE Transactions …, 2021 - ieeexplore.ieee.org
Labeling objects at a subordinate level typically requires expert knowledge, which is not
always available when using random annotators. As such, learning directly from web …

Guided by meta-set: a data-driven method for fine-grained visual recognition

C Zhang, G Lin, Q Wang, F Shen… - IEEE Transactions on …, 2022 - ieeexplore.ieee.org
The lack of sufficient training data has been one obstacle to fine-grained visual classification
research because labeling subcategories generally requires specialist knowledge. As one …

Eqa-mx: Embodied question answering using multimodal expression

MM Islam, A Gladstone, R Islam… - The Twelfth International …, 2023 - openreview.net
Humans predominantly use verbal utterances and nonverbal gestures (eg, eye gaze and
pointing gestures) in their natural interactions. For instance, pointing gestures and verbal …