Evaluating natural language processing models with generalization metrics that do not need...

Unleashing the power of data tsunami: A comprehensive survey on data assessment and selection for instruction tuning of language models

Y Qin, Y Yang, P Guo, G Li, H Shao, Y Shi, Z Xu… - arXiv preprint arXiv …, 2024 - arxiv.org

Instruction tuning plays a critical role in aligning large language models (LLMs) with human
preference. Despite the vast amount of open instruction datasets, naively training a LLM on …

被引用次数：6 相关文章所有 3 个版本

[PDF] acm.org

Test accuracy vs. generalization gap: Model selection in nlp without accessing training or testing data

Y Yang, R Theisen, L Hodgkinson… - Proceedings of the 29th …, 2023 - dl.acm.org

Selecting suitable architecture parameters and training hyperparameters is essential for
enhancing machine learning (ML) model performance. Several recent empirical studies …

被引用次数：16 相关文章所有 2 个版本

[PDF] neurips.cc

Spectral evolution and invariance in linear-width neural networks

Z Wang, A Engel, AD Sarwate… - Advances in neural …, 2023 - proceedings.neurips.cc

We investigate the spectral properties of linear-width feed-forward neural networks, where
the sample size is asymptotically proportional to network width. Empirically, we show that the …

被引用次数：18 相关文章所有 6 个版本

[PDF] mlr.press

A three-regime model of network pruning

Y Zhou, Y Yang, A Chang… - … on Machine Learning, 2023 - proceedings.mlr.press

Recent work has highlighted the complex influence training hyperparameters, eg, the
number of training epochs, can have on the prunability of machine learning models …

被引用次数：7 相关文章所有 8 个版本

[PDF] thecvf.com

Transferable and Principled Efficiency for Open-Vocabulary Segmentation

J Xu, W Chen, Y Zhao, Y Wei - Proceedings of the IEEE …, 2024 - openaccess.thecvf.com

Recent success of pre-trained foundation vision-language models makes Open-Vocabulary
Segmentation (OVS) possible. Despite the promising performance this approach introduces …

被引用次数：2 相关文章所有 3 个版本

[PDF] arxiv.org

Post-mortem on a deep learning contest: a Simpson's paradox and the complementary roles of scale metrics versus shape metrics

CH Martin, MW Mahoney - arXiv preprint arXiv:2106.00734, 2021 - arxiv.org

To understand better good generalization performance in state-of-the-art neural network
(NN) models, and in particular the success of the ALPHAHAT metric based on Heavy-Tailed …

被引用次数：17 相关文章所有 3 个版本

[PDF] jmlr.org

Impact of classification difficulty on the weight matrices spectra in deep learning and application to early-stopping

X Meng, J Yao - The Journal of Machine Learning Research, 2023 - dl.acm.org

Much recent research effort has been devoted to explain the success of deep learning.
Random Matrix Theory (RMT) provides an emerging way to this end by analyzing the …

被引用次数：8 相关文章所有 6 个版本

[PDF] arxiv.org

Towards Scalable and Versatile Weight Space Learning

K Schürholt, MW Mahoney, D Borth - arXiv preprint arXiv:2406.09997, 2024 - arxiv.org

Learning representations of well-trained neural network models holds the promise to
provide an understanding of the inner workings of those models. However, previous work …

被引用次数：6 相关文章所有 2 个版本

[PDF] arxiv.org

Hyper-Representations: Learning from Populations of Neural Networks

K Schürholt - arXiv preprint arXiv:2410.05107, 2024 - arxiv.org

This thesis addresses the challenge of understanding Neural Networks through the lens of
their most fundamental component: the weights, which encapsulate the learned information …

Using Pre-trained LLMs for Multivariate Time Series Forecasting

ML Wolff, S Yang, K Torkkola, MW Mahoney - arXiv preprint arXiv …, 2025 - arxiv.org

Pre-trained Large Language Models (LLMs) encapsulate large amounts of knowledge and
take enormous amounts of compute to train. We make use of this resource, together with the …

高级搜索

QQ 群