Towards anytime classification in early-exit architectures by enforcing conditional monotonicity

Anytime-Valid Confidence Sequences for Consistent Uncertainty Estimation in Early-Exit Neural Networks

M Jazbec, P Forré, S Mandt, D Zhang… - arXiv preprint arXiv …, 2023 - arxiv.org

Early-exit neural networks (EENNs) facilitate adaptive inference by producing predictions at
multiple stages of the forward pass. In safety-critical applications, these predictions are only …

被引用次数：1 相关文章所有 2 个版本

Energy-Efficient Inference With Software-Hardware Co-Design for Sustainable Artificial Intelligence of Things

S Dai, Z Luo, W Luo, S Wang, C Dai… - IEEE Internet of …, 2024 - ieeexplore.ieee.org

The emerging field of Artificial Intelligence of Things (AIoT) is propelled by the remarkable
success of deep learning and hardware evolution, which has a significant impact on our …

[PDF] arxiv.org

On the Role of Depth and Looping for In-Context Learning with Task Diversity

K Gatmiry, N Saunshi, SJ Reddi, S Jegelka… - arXiv preprint arXiv …, 2024 - arxiv.org

The intriguing in-context learning (ICL) abilities of deep Transformer models have lately
garnered significant attention. By studying in-context linear regression on unimodal …

Fast yet Safe: Early-Exiting with Risk Control

M Jazbec, A Timans, TH Veljković, K Sakmann… - arXiv preprint arXiv …, 2024 - arxiv.org

Scaling machine learning models significantly improves their performance. However, such
gains come at the cost of inference being slow and resource-intensive. Early-exit neural …

被引用次数：4 相关文章所有 2 个版本

[PDF] arxiv.org

DuoDiff: Accelerating Diffusion Models with a Dual-Backbone Approach

DG Fernández, RA Matişan, AM Muñoz… - arXiv preprint arXiv …, 2024 - arxiv.org

Diffusion models have achieved unprecedented performance in image generation, yet they
suffer from slow inference due to their iterative sampling process. To address this, early …

RAEE: A Training-Free Retrieval-Augmented Early Exiting Framework for Efficient Inference

L Huang, S Wu, Y Cui, Y Xiong, X Liu, TW Kuo… - arXiv preprint arXiv …, 2024 - arxiv.org

Deploying large language model inference remains challenging due to their high
computational overhead. Early exiting accelerates model inference by adaptively reducing …

被引用次数：5 相关文章所有 2 个版本

[PDF] arxiv.org

Cascade-Aware Training of Language Models

C Wang, S Augenstein, K Rush, W Jitkrittum… - arXiv preprint arXiv …, 2024 - arxiv.org

Reducing serving cost and latency is a fundamental concern for the deployment of language
models (LMs) in business applications. To address this, cascades of LMs offer an effective …

被引用次数：3 相关文章所有 2 个版本

[PDF] arxiv.org

Breaking the Ceiling of the LLM Community by Treating Token Generation as a Classification for Ensembling

YC Yu, CC Kuo, Z Ye, YC Chang, YS Li - arXiv preprint arXiv:2406.12585, 2024 - arxiv.org

Ensembling multiple models has always been an effective approach to push the limits of
existing performance and is widely used in classification tasks by simply averaging the …

被引用次数：3 相关文章

[PDF] arxiv.org

Dynamic Vocabulary Pruning in Early-Exit LLMs

J Vincenti, KA Sadek, J Velja, M Nulli… - arXiv preprint arXiv …, 2024 - arxiv.org

Increasing the size of large language models (LLMs) has been shown to lead to better
performance. However, this comes at the cost of slower and more expensive inference. Early …

高级搜索

QQ 群