Widely observed neural scaling laws, in which error falls off as a power of the training set size, model size, or both, have driven substantial performance improvements in deep …
Over recent years, an increasing amount of compute and data has been poured into training large language models (LLMs), usually by doing one-pass learning on as many tokens as …
A few years ago, the first CNN surpassed human performance on ImageNet. However, it soon became clear that machines lack robustness on more challenging test cases, a major …
Biological and artificial information processing systems form representations that they can use to categorize, reason, plan, navigate, and make decisions. How can we measure the …
H Tan, S Wu, F Du, Y Chen, Z Wang… - Advances in Neural …, 2024 - proceedings.neurips.cc
In this paper, we propose a novel data-pruning approach called moving-one-sample-out (MoSo), which aims to identify and remove the least informative samples from the training …
P Sun, B Shi, D Yu, T Lin - … of the IEEE/CVF Conference on …, 2024 - openaccess.thecvf.com
Contemporary machine learning which involves training large neural networks on massive datasets faces significant computational challenges. Dataset distillation as a recent …
Instruction tuning has unlocked powerful capabilities in large language models (LLMs), effectively using combined datasets to develop generalpurpose chatbots. However, real …
Y He, L Xiao, JT Zhou - Advances in Neural Information …, 2023 - proceedings.neurips.cc
Dataset condensation is a crucial tool for enhancing training efficiency by reducing the size of the training dataset, particularly in on-device scenarios. However, these scenarios have …
L Wei, Z Jiang, W Huang, L Sun - arXiv preprint arXiv:2308.12067, 2023 - arxiv.org
Multimodal large language models acquire their instruction-following capabilities through a two-stage training process: pre-training on image-text pairs and fine-tuning on supervised …