Dataset distillation: A comprehensive review

R Yu, S Liu, X Wang - IEEE Transactions on Pattern Analysis …, 2023 - ieeexplore.ieee.org
Recent success of deep learning is largely attributed to the sheer amount of data used for
training deep neural networks. Despite the unprecedented success, the massive data …

Datacomp: In search of the next generation of multimodal datasets

SY Gadre, G Ilharco, A Fang… - Advances in …, 2024 - proceedings.neurips.cc
Multimodal datasets are a critical component in recent breakthroughs such as CLIP, Stable
Diffusion and GPT-4, yet their design does not receive the same research attention as model …

Towards total recall in industrial anomaly detection

K Roth, L Pemula, J Zepeda… - Proceedings of the …, 2022 - openaccess.thecvf.com
Being able to spot defective parts is a critical component in large-scale industrial
manufacturing. A particular challenge that we address in this work is the cold-start problem …

Cafe: Learning to condense dataset by aligning features

K Wang, B Zhao, X Peng, Z Zhu… - Proceedings of the …, 2022 - openaccess.thecvf.com
Dataset condensation aims at reducing the network training effort through condensing a
cumbersome training set into a compact synthetic one. State-of-the-art approaches largely …

Data-efficient Fine-tuning for LLM-based Recommendation

X Lin, W Wang, Y Li, S Yang, F Feng, Y Wei… - Proceedings of the 47th …, 2024 - dl.acm.org
Leveraging Large Language Models (LLMs) for recommendation has recently garnered
considerable attention, where fine-tuning plays a key role in LLMs' adaptation. However, the …

Improved distribution matching for dataset condensation

G Zhao, G Li, Y Qin, Y Yu - … of the IEEE/CVF Conference on …, 2023 - openaccess.thecvf.com
Dataset Condensation aims to condense a large dataset into a smaller one while
maintaining its ability to train a well-performing model, thus reducing the storage cost and …

Deepcore: A comprehensive library for coreset selection in deep learning

C Guo, B Zhao, Y Bai - International Conference on Database and Expert …, 2022 - Springer
Coreset selection, which aims to select a subset of the most informative training samples, is
a long-standing learning problem that can benefit many downstream tasks such as data …

Demystifying clip data

H Xu, S Xie, XE Tan, PY Huang, R Howes… - arXiv preprint arXiv …, 2023 - arxiv.org
Contrastive Language-Image Pre-training (CLIP) is an approach that has advanced
research and applications in computer vision, fueling modern recognition systems and …

Submodularity in data subset selection and active learning

K Wei, R Iyer, J Bilmes - International conference on …, 2015 - proceedings.mlr.press
We study the problem of selecting a subset of big data to train a classifier while incurring
minimal performance loss. We show the connection of submodularity to the data likelihood …

A hybrid machine learning model for intrusion detection in VANET

H Bangui, M Ge, B Buhnova - Computing, 2022 - Springer
Abstract While Vehicular Ad-hoc Network (VANET) is developed to enable effective vehicle
communication and traffic information exchange, VANET is also vulnerable to different …