Distributional semantics of objects in visual scenes in comparison to text

K Yao, J Liang, J Liang, M Li, F Cao - Artificial Intelligence, 2022 - Elsevier

Recent advances in graph convolutional networks (GCNs), which mainly focus on how to
exploit information from different hops of neighbors in an efficient way, have brought …

被引用次数：63 相关文章所有 6 个版本

[PDF] mit.edu

Word representation learning in multimodal pre-trained transformers: An intrinsic evaluation

S Pezzelle, E Takmaz, R Fernández - Transactions of the Association …, 2021 - direct.mit.edu

This study carries out a systematic intrinsic evaluation of the semantic representations
learned by state-of-the-art pre-trained multimodal Transformers. These representations are …

被引用次数：19 相关文章所有 7 个版本

[PDF] springer.com

Language with vision: A study on grounded word and sentence embeddings

H Shahmohammadi, M Heitmeier… - Behavior Research …, 2024 - Springer

Grounding language in vision is an active field of research seeking to construct cognitively
plausible word and sentence representations by incorporating perceptual knowledge from …

被引用次数：9 相关文章所有 13 个版本

[PDF] arxiv.org

Leverage points in modality shifts: Comparing language-only and multimodal word representations

A Tikhonov, L Bylinina, D Paperno - arXiv preprint arXiv:2306.02348, 2023 - arxiv.org

Multimodal embeddings aim to enrich the semantic information in neural representations of
language compared to text-only models. While different embeddings exhibit different …

被引用次数：3 相关文章所有 5 个版本

Exploring diagram-based visual problem representation and relational abstraction

CD Nath, SM Hazarika - Spatial Cognition & Computation, 2025 - Taylor & Francis

For visual information processing, the derivation of meaningful low-level spatio-temporal
information is challenging. In line with human visualisation and perception in spatial …

被引用次数：1 相关文章

[PDF] utep.edu

Context-Aware Temporal Embeddings for Text and Video Data

A Farhan - 2023 - search.proquest.com

Recent years have seen an exponential increase in unstructured data, primarily in the form
of text, images, and videos. Extracting useful features and trends from large-scale …

被引用次数：1 相关文章所有 3 个版本

Traffic sign recognition and distance estimation with yolov3 model

GSR Nath, J Acharjee, S Deb - 2021 International Conference …, 2021 - ieeexplore.ieee.org

Due to the expeditious increase in the number of vehicles, there is an increase in the
number of road casualties even in a highly sophisticated roadway. This depicts the natural …

被引用次数：4 相关文章

[PDF] arvojournals.org

Semantic object-scene inconsistencies affect eye movements, but not in the way predicted by contextualized meaning maps

MA Pedziwiatr, M Kümmerer, TSA Wallis… - Journal of …, 2022 - jov.arvojournals.org

Semantic information is important in eye movement control. An important semantic influence
on gaze guidance relates to object-scene relationships: objects that are semantically …

被引用次数：5 相关文章所有 11 个版本

[PDF] thecvf.com

Towards contextual learning in few-shot object classification

MP Fortin, B Chaib-draa - Proceedings of the IEEE/CVF …, 2021 - openaccess.thecvf.com

Few-shot Learning (FSL) aims to classify new concepts from a small number of examples.
While there have been an increasing amount of work on few-shot object classification in the …

被引用次数：6 相关文章所有 6 个版本

Vizobj2vec: Contextual representation learning for visual objects in video-frames

A Farhan, MS Hossain - … Conference on Big Data (Big Data), 2020 - ieeexplore.ieee.org

While the use of the distributional hypothesis has become popular in creating embedding for
text corpus, it is rarely used for generating the contextual (distributed) representation of …

被引用次数：3 相关文章所有 2 个版本

高级搜索

QQ 群