A Survey on Self-supervised Learning: Algorithms, Applications, and Future Trends

J Gui, T Chen, J Zhang, Q Cao, Z Sun… - IEEE Transactions on …, 2024 - ieeexplore.ieee.org
Deep supervised learning algorithms typically require a large volume of labeled data to
achieve satisfactory performance. However, the process of collecting and labeling such data …

NTIRE 2024 challenge on short-form UGC video quality assessment: Methods and results

X Li, K Yuan, Y Pei, Y Lu, M Sun… - Proceedings of the …, 2024 - openaccess.thecvf.com
This paper reviews the NTIRE 2024 Challenge on Shortform UGC Video Quality
Assessment (S-UGC VQA) where various excellent solutions are submitted and evaluated …

A survey on multimodal large language models

S Yin, C Fu, S Zhao, K Li, X Sun, T Xu… - arXiv preprint arXiv …, 2023 - arxiv.org
Multimodal Large Language Model (MLLM) recently has been a new rising research
hotspot, which uses powerful Large Language Models (LLMs) as a brain to perform …

Anydoor: Zero-shot object-level image customization

X Chen, L Huang, Y Liu, Y Shen… - Proceedings of the …, 2024 - openaccess.thecvf.com
This work presents AnyDoor a diffusion-based image generator with the power to teleport
target objects to new scenes at user-specified locations with desired shapes. Instead of …

Eyes wide shut? exploring the visual shortcomings of multimodal llms

S Tong, Z Liu, Y Zhai, Y Ma… - Proceedings of the …, 2024 - openaccess.thecvf.com
Is vision good enough for language? Recent advancements in multimodal models primarily
stem from the powerful reasoning abilities of large language models (LLMs). However the …

Multimodal foundation models: From specialists to general-purpose assistants

C Li, Z Gan, Z Yang, J Yang, L Li… - … and Trends® in …, 2024 - nowpublishers.com
Neural compression is the application of neural networks and other machine learning
methods to data compression. Recent advances in statistical machine learning have opened …

Towards a general-purpose foundation model for computational pathology

RJ Chen, T Ding, MY Lu, DFK Williamson, G Jaume… - Nature Medicine, 2024 - nature.com
Quantitative evaluation of tissue images is crucial for computational pathology (CPath) tasks,
requiring the objective characterization of histopathological entities from whole-slide images …

Triplane meets gaussian splatting: Fast and generalizable single-view 3d reconstruction with transformers

ZX Zou, Z Yu, YC Guo, Y Li, D Liang… - Proceedings of the …, 2024 - openaccess.thecvf.com
Recent advancements in 3D reconstruction from single images have been driven by the
evolution of generative models. Prominent among these are methods based on Score …

Adversarial diffusion distillation

A Sauer, D Lorenz, A Blattmann… - arXiv preprint arXiv …, 2023 - arxiv.org
We introduce Adversarial Diffusion Distillation (ADD), a novel training approach that
efficiently samples large-scale foundational image diffusion models in just 1-4 steps while …

Internvl: Scaling up vision foundation models and aligning for generic visual-linguistic tasks

Z Chen, J Wu, W Wang, W Su, G Chen… - Proceedings of the …, 2024 - openaccess.thecvf.com
The exponential growth of large language models (LLMs) has opened up numerous
possibilities for multi-modal AGI systems. However the progress in vision and vision …