AlphaLoRA: Assigning LoRA Experts Based on Layer Training Quality

P Qing, C Gao, Y Zhou, X Diao, Y Yang… - arXiv preprint arXiv …, 2024 - arxiv.org
Parameter-efficient fine-tuning methods, such as Low-Rank Adaptation (LoRA), are known
to enhance training efficiency in Large Language Models (LLMs). Due to the limited …

Using Uncertainty Quantification to Characterize and Improve Out-of-Domain Learning for PDEs

SC Mouli, DC Maddix, S Alizadeh, G Gupta… - arXiv preprint arXiv …, 2024 - arxiv.org
Existing work in scientific machine learning (SciML) has shown that data-driven learning of
solution operators can provide a fast approximate alternative to classical numerical partial …

Temperature Optimization for Bayesian Deep Learning

K Ng, C van der Heide, L Hodgkinson, S Wei - arXiv preprint arXiv …, 2024 - arxiv.org
The Cold Posterior Effect (CPE) is a phenomenon in Bayesian Deep Learning (BDL), where
tempering the posterior to a cold temperature often improves the predictive performance of …

Crafting Heavy-Tails in Weight Matrix Spectrum without Gradient Noise

V Kothapalli, T Pang, S Deng, Z Liu, Y Yang - arXiv preprint arXiv …, 2024 - arxiv.org
Modern training strategies of deep neural networks (NNs) tend to induce a heavy-tailed (HT)
spectra of layer weights. Extensive efforts to study this phenomenon have found that NNs …

A PAC-Bayesian Perspective on the Interpolating Information Criterion

L Hodgkinson, C van der Heide, R Salomone… - arXiv preprint arXiv …, 2023 - arxiv.org
Deep learning is renowned for its theory-practice gap, whereby principled theory typically
fails to provide much beneficial guidance for implementation in practice. This has been …

Gibbs-Based Information Criteria and the Over-Parameterized Regime

H Chen, GW Wornell, Y Bu - International Conference on …, 2024 - proceedings.mlr.press
Double-descent refers to the unexpected drop in test loss of a learning algorithm beyond an
interpolating threshold with over-parameterization, which is not predicted by information …

[HTML][HTML] On Implicit Smoothness Regularization in Deep Learning

M Gamba - 2024 - diva-portal.org
State of the art neural networks provide a rich class of function approximators, fueling the
remarkable success of gradient-based deep learning on complex high-dimensional …

[HTML][HTML] Using uncertainty quantification to characterize and improve out-of-domain learning for PDEs

SC Mouli, DM Robinson, S Alizadeh, G Gupta, A Stuart… - 2024 - amazon.science
Existing work in scientific machine learning (SciML) has shown that data-driven learning of
solution operators can provide a fast approximate alternative to classical numerical partial …

[PDF][PDF] AN ASYMPTOTICALLY OPTIMAL METHOD FOR CONSTRAINED STOCHASTIC OPTIMIZATION BY SEN NA, YIHANG GAO 2, MICHAEL K. NG 3, AND …

SEN NA, Y GAO, MK NG - senna1128.github.io
We perform statistical inference for the solution of stochastic optimization problems with
equality and box inequality constraints. The considered problems are prevalent in statistics …