查看文章

arxiv.org 中的 [PDF]

Training Data Subset Search With Ensemble Active Learning

作者

Kashyap Chitta, José M Álvarez, Elmar Haussmann, Clément Farabet

发表日期

2021/12/31

期刊

IEEE Transactions on Intelligent Transportation Systems

出版商

IEEE

简介

Deep Neural Networks (DNNs) often rely on vast datasets for training. Given the large size of such datasets, it is conceivable that they contain specific samples that either do not contribute or negatively impact the DNN’s optimization. Modifying the training distribution to exclude such samples could provide an effective solution to improve performance and reduce training time. This paper proposes to scale up ensemble Active Learning (AL) methods to perform acquisition at a large scale (10k to 500k samples at a time). We do this with ensembles of hundreds of models, obtained at a minimal computational cost by reusing intermediate training checkpoints. This allows us to automatically and efficiently perform a training data subset search for large labeled datasets. We observe that our approach obtains favorable subsets of training data, which can be used to train more accurate DNNs than training with the entire …

引用总数

被引用次数：62

2019202020212022202320241 8 8 12 20 12

学术搜索中的文章

Training data subset search with ensemble active learning

K Chitta, JM Álvarez, E Haussmann, C Farabet - IEEE Transactions on Intelligent Transportation …, 2021

被引用次数：40 相关文章所有 8 个版本

Less is more: An exploration of data redundancy with active dataset subsampling*

KCJMA Elmar, HC Farabet - arXiv preprint arXiv:1905.12737, 2019

K Chitta, JM Alvarez, E Haussmann, C Farabet - 2019

被引用次数：7 相关文章