Checkmate: Breaking the memory wall with optimal tensor rematerialization

A Imteaj, U Thakker, S Wang, J Li… - IEEE Internet of Things …, 2021 - ieeexplore.ieee.org

Federated learning (FL) is a distributed machine learning strategy that generates a global
model by learning from multiple decentralized edge clients. FL enables on-device training …

被引用次数：602 相关文章所有 2 个版本

[PDF] arxiv.org

Scientometric review of artificial intelligence for operations & maintenance of wind turbines: The past, present and future

J Chatterjee, N Dethlefs - Renewable and Sustainable Energy Reviews, 2021 - Elsevier

Wind energy has emerged as a highly promising source of renewable energy in recent
times. However, wind turbines regularly suffer from operational inconsistencies, leading to …

被引用次数：114 相关文章所有 8 个版本

[PDF] acm.org

Efficient memory management for large language model serving with pagedattention

W Kwon, Z Li, S Zhuang, Y Sheng, L Zheng… - Proceedings of the 29th …, 2023 - dl.acm.org

High throughput serving of large language models (LLMs) requires batching sufficiently
many requests at a time. However, existing systems struggle because the key-value cache …

被引用次数：1199 相关文章所有 4 个版本

[PDF] arxiv.org

Efficient large-scale language model training on gpu clusters using megatron-lm

D Narayanan, M Shoeybi, J Casper… - Proceedings of the …, 2021 - dl.acm.org

Large language models have led to state-of-the-art accuracies across several tasks.
However, training these models efficiently is challenging because: a) GPU memory capacity …

被引用次数：716 相关文章所有 11 个版本

[PDF] thecvf.com

Putting nerf on a diet: Semantically consistent few-shot view synthesis

A Jain, M Tancik, P Abbeel - Proceedings of the IEEE/CVF …, 2021 - openaccess.thecvf.com

We present DietNeRF, a 3D neural scene representation estimated from a few images.
Neural Radiance Fields (NeRF) learn a continuous volumetric representation of a scene …

被引用次数：511 相关文章所有 7 个版本

[PDF] arxiv.org

Zero-infinity: Breaking the gpu memory wall for extreme scale deep learning

S Rajbhandari, O Ruwase, J Rasley, S Smith… - Proceedings of the …, 2021 - dl.acm.org

In the last three years, the largest dense deep learning models have grown over 1000x to
reach hundreds of billions of parameters, while the GPU memory has only grown by 5x (16 …

被引用次数：345 相关文章所有 5 个版本

[PDF] github.io

Zero: Memory optimizations toward training trillion parameter models

S Rajbhandari, J Rasley, O Ruwase… - … Conference for High …, 2020 - ieeexplore.ieee.org

Large deep learning models offer significant accuracy gains, but training billions to trillions
of parameters is challenging. Existing solutions such as data and model parallelisms exhibit …

被引用次数：1306 相关文章所有 11 个版本

[HTML] sciencedirect.com

[HTML][HTML] Combined scaling for zero-shot transfer learning

H Pham, Z Dai, G Ghiasi, K Kawaguchi, H Liu, AW Yu… - Neurocomputing, 2023 - Elsevier

Recent developments in multimodal training methodologies, including CLIP and ALIGN,
obviate the necessity for individual data labeling. These approaches utilize pairs of data and …

被引用次数：189 相关文章所有 5 个版本

[PDF] mlr.press

Train big, then compress: Rethinking model size for efficient training and inference of transformers

Z Li, E Wallace, S Shen, K Lin… - International …, 2020 - proceedings.mlr.press

Since hardware resources are limited, the objective of training deep learning models is
typically to maximize accuracy subject to the time and memory constraints of training and …

被引用次数：332 相关文章所有 11 个版本

[PDF] arxiv.org

DAPPLE: A pipelined data parallel approach for training large models

S Fan, Y Rong, C Meng, Z Cao, S Wang… - Proceedings of the 26th …, 2021 - dl.acm.org

It is a challenging task to train large DNN models on sophisticated GPU platforms with
diversified interconnect capabilities. Recently, pipelined training has been proposed as an …

被引用次数：238 相关文章所有 11 个版本

高级搜索

QQ 群