Rethinking the role of scale for in-context learning: An interpretability-based case study at 66 billion scale

H Bansal, K Gopalakrishnan, S Dingliwal… - arXiv preprint arXiv …, 2022 - arxiv.org
Language models have been shown to perform better with an increase in scale on a wide
variety of tasks via the in-context learning paradigm. In this paper, we investigate the …

Submodular minimax optimization: Finding effective sets

LR Mualem, ER Elenberg… - International …, 2024 - proceedings.mlr.press
Despite the rich existing literature about minimax optimization in continuous settings, only
very partial results of this kind have been obtained for combinatorial settings. In this paper …

DASS: differentiable architecture search for sparse neural networks

H Mousavi, M Loni, M Alibeigi… - ACM Transactions on …, 2023 - dl.acm.org
The deployment of Deep Neural Networks (DNNs) on edge devices is hindered by the
substantial gap between performance requirements and available computational power …

Sequential attention for feature selection

T Yasuda, MH Bateni, L Chen, M Fahrbach… - arXiv preprint arXiv …, 2022 - arxiv.org
Feature selection is the problem of selecting a subset of features for a machine learning
model that maximizes model quality subject to a budget constraint. For neural networks …

A survey of lottery ticket hypothesis

B Liu, Z Zhang, P He, Z Wang, Y Xiao, R Ye… - arXiv preprint arXiv …, 2024 - arxiv.org
The Lottery Ticket Hypothesis (LTH) states that a dense neural network model contains a
highly sparse subnetwork (ie, winning tickets) that can achieve even better performance …

Structured pruning of neural networks for constraints learning

M Cacciola, A Frangioni, A Lodi - arXiv preprint arXiv:2307.07457, 2023 - arxiv.org
In recent years, the integration of Machine Learning (ML) models with Operation Research
(OR) tools has gained popularity across diverse applications, including cancer treatment …

Bicriteria approximation algorithms for the submodular cover problem

W Chen, V Crawford - Advances in Neural Information …, 2024 - proceedings.neurips.cc
In this paper, we consider the optimization problem Submodular Cover (SCP), which is to
find a minimum cardinality subset of a finite universe $ U $ such that the value of a …

Talking Models: Distill Pre-trained Knowledge to Downstream Models via Interactive Communication

Z Zhao, Q Liu, H Gui, B An, L Hong, EH Chi - arXiv preprint arXiv …, 2023 - arxiv.org
Many recent breakthroughs in machine learning have been enabled by the pre-trained
foundation models. By scaling up model parameters, training data, and computation …

Neural Network Reduction with Guided Regularizers

AHM Rafid, A Sandu - arXiv preprint arXiv:2305.18448, 2023 - arxiv.org
Regularization techniques such as $\mathcal {L} _1 $ and $\mathcal {L} _2 $ regularizers
are effective in sparsifying neural networks (NNs). However, to remove a certain neuron or …

Deep neural networks pruning via the structured perspective regularization

M Cacciola, A Frangioni, X Li, A Lodi - SIAM Journal on Mathematics of Data …, 2023 - SIAM
In machine learning, artificial neural networks (ANNs) are a very powerful tool, broadly used
in many applications. Often, the selected (deep) architectures include many layers, and …