Alignment and generation adapter for efficient video-text understanding

Advancing Multimodal Large Language Models with Quantization-Aware Scale Learning for Efficient Adaptation

J Xie, Y Zhang, M Lin, L Cao, R Ji - Proceedings of the 32nd ACM …, 2024 - dl.acm.org

This paper presents the first study to explore the potential of parameter quantization for
multimodal large language models to alleviate the significant resource constraint …

被引用次数：2 相关文章所有 5 个版本

[PDF] arxiv.org

Foundation Models for Video Understanding: A Survey

N Madan, A Møgelmose, R Modi, YS Rawat… - arXiv preprint arXiv …, 2024 - arxiv.org

Video Foundation Models (ViFMs) aim to learn a general-purpose representation for various
video understanding tasks. Leveraging large-scale datasets and powerful models, ViFMs …

被引用次数：16 相关文章所有 5 个版本

[PDF] researchsquare.com

Efficient Ransomware Detection via Portable Executable File Image Analysis By LLaMA-7b

X Li, T Zhu, W Zhang - 2023 - researchsquare.com

This research focuses on developing a novel ransomware detection methodology
leveraging the capabilities of the open source large language model LLaMA-7b and image …

被引用次数：116 相关文章所有 2 个版本

[PDF] arxiv.org

Parameter Efficient Fine Tuning: A Comprehensive Analysis Across Applications

CCS Balne, S Bhaduri, T Roy, V Jain… - arXiv preprint arXiv …, 2024 - arxiv.org

The rise of deep learning has marked significant progress in fields such as computer vision,
natural language processing, and medical imaging, primarily through the adaptation of pre …

被引用次数：11 相关文章所有 2 个版本

[PDF] ssrn.com

Knowledge Enhancement and Disentanglement Learning for Video Captioning

M Wang, Y Ma, B Cai, D Li, X He… - Available at SSRN …, 2024 - papers.ssrn.com

Video captioning, bridging computer vision and natural language, is crucial for various
knowledge-based systems in the age of video streaming. Recent advancements in video …

高级搜索

QQ 群