Advancing Multimodal Large Language Models with Quantization-Aware Scale Learning for Efficient Adaptation

J Xie, Y Zhang, M Lin, L Cao, R Ji - Proceedings of the 32nd ACM …, 2024 - dl.acm.org
This paper presents the first study to explore the potential of parameter quantization for
multimodal large language models to alleviate the significant resource constraint …

Foundation Models for Video Understanding: A Survey

N Madan, A Møgelmose, R Modi, YS Rawat… - arXiv preprint arXiv …, 2024 - arxiv.org
Video Foundation Models (ViFMs) aim to learn a general-purpose representation for various
video understanding tasks. Leveraging large-scale datasets and powerful models, ViFMs …

Efficient Ransomware Detection via Portable Executable File Image Analysis By LLaMA-7b

X Li, T Zhu, W Zhang - 2023 - researchsquare.com
This research focuses on developing a novel ransomware detection methodology
leveraging the capabilities of the open source large language model LLaMA-7b and image …

Parameter Efficient Fine Tuning: A Comprehensive Analysis Across Applications

CCS Balne, S Bhaduri, T Roy, V Jain… - arXiv preprint arXiv …, 2024 - arxiv.org
The rise of deep learning has marked significant progress in fields such as computer vision,
natural language processing, and medical imaging, primarily through the adaptation of pre …

Knowledge Enhancement and Disentanglement Learning for Video Captioning

M Wang, Y Ma, B Cai, D Li, X He… - Available at SSRN …, 2024 - papers.ssrn.com
Video captioning, bridging computer vision and natural language, is crucial for various
knowledge-based systems in the age of video streaming. Recent advancements in video …