Large Multimodal Model Compression via Efficient Pruning and Distillation at AntGroup

M Wang, Y Zhao, J Liu, J Chen, C Zhuang, J Gu… - arXiv preprint arXiv …, 2023 - arxiv.org
The deployment of Large Multimodal Models (LMMs) within AntGroup has significantly
advanced multimodal tasks in payment, security, and advertising, notably enhancing …

A Directional Diffusion Graph Transformer for Recommendation

Z Yi, X Wang, I Ounis - arXiv preprint arXiv:2404.03326, 2024 - arxiv.org
In real-world recommender systems, implicitly collected user feedback, while abundant,
often includes noisy false-positive and false-negative interactions. The possible …

CLCE: An Approach to Refining Cross-Entropy and Contrastive Learning for Optimized Learning Fusion

Z Long, G Killick, L Zhuang… - arXiv preprint arXiv …, 2024 - arxiv.org
State-of-the-art pre-trained image models predominantly adopt a two-stage approach: initial
unsupervised pre-training on large-scale datasets followed by task-specific fine-tuning using …

Text2Pic Swift: Enhancing Long-Text to Image Retrieval for Large-Scale Libraries

Z Long, X Ge, R Mccreadie, J Jose - arXiv preprint arXiv:2402.15276, 2024 - arxiv.org
Text-to-image retrieval plays a crucial role across various applications, including digital
libraries, e-commerce platforms, and multimedia databases, by enabling the search for …

LaCViT: A Label-Aware Contrastive Fine-Tuning Framework for Vision Transformers

Z Long, R McCreadie, GA Camarasa… - ICASSP 2024-2024 …, 2024 - ieeexplore.ieee.org
Vision Transformers (ViTs) have emerged as popular models in computer vision,
demonstrating state-of-the-art performance across various tasks. This success typically …

Large Multimodal Model Compression via Iterative Efficient Pruning and Distillation

M Wang, Y Zhao, J Liu, J Chen, C Zhuang… - … Proceedings of the …, 2024 - dl.acm.org
The deployment of Large Multimodal Models (LMMs) within Ant Group has significantly
advanced multimodal tasks in payment, security, and advertising, notably enhancing …

Multiway-Adapter: Adapting Multimodal Large Language Models for Scalable Image-Text Retrieval

Z Long, G Killick, R McCreadie… - ICASSP 2024-2024 …, 2024 - ieeexplore.ieee.org
As Multimodal Large Language Models (MLLMs) grow in size, adapting them to specialized
tasks becomes increasingly challenging due to high computational and memory demands …

Ducho 2.0: Towards a More Up-to-Date Feature Extraction and Processing Framework for Multimodal Recommendation

M Attimonelli, D Danese, D Malitesta, C Pomo… - arXiv preprint arXiv …, 2024 - arxiv.org
In this work, we introduce Ducho 2.0, the latest stable version of our framework. Differently
from Ducho, Ducho 2.0 offers a more personalized user experience with the definition and …

Ducho 2.0: Towards a More Up-to-Date Unified Framework for the Extraction of Multimodal Features in Recommendation

M Attimonelli, D Danese, D Malitesta, C Pomo… - … Proceedings of the …, 2024 - dl.acm.org
In this work, we introduce Ducho 2.0, the latest stable version of our framework. Differently
from Ducho, Ducho 2.0 offers a more personalized user experience with the definition and …

[PDF][PDF] WIP: A First Look At Employing Large Multimodal Models Against Autonomous Vehicle Attacks

M Aldeen, P MohajerAnsari, J Ma, M Chowdhury… - ndss-symposium.org
As the advent of autonomous vehicle (AV) technology revolutionizes transportation, it
simultaneously introduces new vulnerabilities to cyber-attacks, posing significant challenges …