Large multi-modal encoders for recommendation

M Wang, Y Zhao, J Liu, J Chen, C Zhuang, J Gu… - arXiv preprint arXiv …, 2023 - arxiv.org

The deployment of Large Multimodal Models (LMMs) within AntGroup has significantly
advanced multimodal tasks in payment, security, and advertising, notably enhancing …

被引用次数：1 相关文章所有 2 个版本

[PDF] arxiv.org

A Directional Diffusion Graph Transformer for Recommendation

Z Yi, X Wang, I Ounis - arXiv preprint arXiv:2404.03326, 2024 - arxiv.org

In real-world recommender systems, implicitly collected user feedback, while abundant,
often includes noisy false-positive and false-negative interactions. The possible …

相关文章所有 2 个版本

[PDF] arxiv.org

CLCE: An Approach to Refining Cross-Entropy and Contrastive Learning for Optimized Learning Fusion

Z Long, G Killick, L Zhuang… - arXiv preprint arXiv …, 2024 - arxiv.org

State-of-the-art pre-trained image models predominantly adopt a two-stage approach: initial
unsupervised pre-training on large-scale datasets followed by task-specific fine-tuning using …

相关文章所有 2 个版本

[PDF] arxiv.org

Text2Pic Swift: Enhancing Long-Text to Image Retrieval for Large-Scale Libraries

Z Long, X Ge, R Mccreadie, J Jose - arXiv preprint arXiv:2402.15276, 2024 - arxiv.org

Text-to-image retrieval plays a crucial role across various applications, including digital
libraries, e-commerce platforms, and multimedia databases, by enabling the search for …

相关文章所有 2 个版本

[PDF] arxiv.org

LaCViT: A Label-Aware Contrastive Fine-Tuning Framework for Vision Transformers

Z Long, R McCreadie, GA Camarasa… - ICASSP 2024-2024 …, 2024 - ieeexplore.ieee.org

Vision Transformers (ViTs) have emerged as popular models in computer vision,
demonstrating state-of-the-art performance across various tasks. This success typically …

被引用次数：2 相关文章所有 4 个版本

Large Multimodal Model Compression via Iterative Efficient Pruning and Distillation

M Wang, Y Zhao, J Liu, J Chen, C Zhuang… - … Proceedings of the …, 2024 - dl.acm.org

The deployment of Large Multimodal Models (LMMs) within Ant Group has significantly
advanced multimodal tasks in payment, security, and advertising, notably enhancing …

Multiway-Adapter: Adapting Multimodal Large Language Models for Scalable Image-Text Retrieval

Z Long, G Killick, R McCreadie… - ICASSP 2024-2024 …, 2024 - ieeexplore.ieee.org

As Multimodal Large Language Models (MLLMs) grow in size, adapting them to specialized
tasks becomes increasingly challenging due to high computational and memory demands …

相关文章所有 2 个版本

[PDF] arxiv.org

Ducho 2.0: Towards a More Up-to-Date Feature Extraction and Processing Framework for Multimodal Recommendation

M Attimonelli, D Danese, D Malitesta, C Pomo… - arXiv preprint arXiv …, 2024 - arxiv.org

In this work, we introduce Ducho 2.0, the latest stable version of our framework. Differently
from Ducho, Ducho 2.0 offers a more personalized user experience with the definition and …

Ducho 2.0: Towards a More Up-to-Date Unified Framework for the Extraction of Multimodal Features in Recommendation

M Attimonelli, D Danese, D Malitesta, C Pomo… - … Proceedings of the …, 2024 - dl.acm.org

In this work, we introduce Ducho 2.0, the latest stable version of our framework. Differently
from Ducho, Ducho 2.0 offers a more personalized user experience with the definition and …

被引用次数：1 相关文章所有 2 个版本

[PDF] ndss-symposium.org

[PDF][PDF] WIP: A First Look At Employing Large Multimodal Models Against Autonomous Vehicle Attacks

M Aldeen, P MohajerAnsari, J Ma, M Chowdhury… - ndss-symposium.org

As the advent of autonomous vehicle (AV) technology revolutionizes transportation, it
simultaneously introduces new vulnerabilities to cyber-attacks, posing significant challenges …

高级搜索

QQ 群