A survey on federated learning for resource-constrained IoT devices

A Imteaj, U Thakker, S Wang, J Li… - IEEE Internet of Things …, 2021 - ieeexplore.ieee.org
Federated learning (FL) is a distributed machine learning strategy that generates a global
model by learning from multiple decentralized edge clients. FL enables on-device training …

Scientometric review of artificial intelligence for operations & maintenance of wind turbines: The past, present and future

J Chatterjee, N Dethlefs - Renewable and Sustainable Energy Reviews, 2021 - Elsevier
Wind energy has emerged as a highly promising source of renewable energy in recent
times. However, wind turbines regularly suffer from operational inconsistencies, leading to …

Efficient memory management for large language model serving with pagedattention

W Kwon, Z Li, S Zhuang, Y Sheng, L Zheng… - Proceedings of the 29th …, 2023 - dl.acm.org
High throughput serving of large language models (LLMs) requires batching sufficiently
many requests at a time. However, existing systems struggle because the key-value cache …

Efficient large-scale language model training on gpu clusters using megatron-lm

D Narayanan, M Shoeybi, J Casper… - Proceedings of the …, 2021 - dl.acm.org
Large language models have led to state-of-the-art accuracies across several tasks.
However, training these models efficiently is challenging because: a) GPU memory capacity …

Putting nerf on a diet: Semantically consistent few-shot view synthesis

A Jain, M Tancik, P Abbeel - Proceedings of the IEEE/CVF …, 2021 - openaccess.thecvf.com
We present DietNeRF, a 3D neural scene representation estimated from a few images.
Neural Radiance Fields (NeRF) learn a continuous volumetric representation of a scene …

Zero-infinity: Breaking the gpu memory wall for extreme scale deep learning

S Rajbhandari, O Ruwase, J Rasley, S Smith… - Proceedings of the …, 2021 - dl.acm.org
In the last three years, the largest dense deep learning models have grown over 1000x to
reach hundreds of billions of parameters, while the GPU memory has only grown by 5x (16 …

Zero: Memory optimizations toward training trillion parameter models

S Rajbhandari, J Rasley, O Ruwase… - … Conference for High …, 2020 - ieeexplore.ieee.org
Large deep learning models offer significant accuracy gains, but training billions to trillions
of parameters is challenging. Existing solutions such as data and model parallelisms exhibit …

[HTML][HTML] Combined scaling for zero-shot transfer learning

H Pham, Z Dai, G Ghiasi, K Kawaguchi, H Liu, AW Yu… - Neurocomputing, 2023 - Elsevier
Recent developments in multimodal training methodologies, including CLIP and ALIGN,
obviate the necessity for individual data labeling. These approaches utilize pairs of data and …

Train big, then compress: Rethinking model size for efficient training and inference of transformers

Z Li, E Wallace, S Shen, K Lin… - International …, 2020 - proceedings.mlr.press
Since hardware resources are limited, the objective of training deep learning models is
typically to maximize accuracy subject to the time and memory constraints of training and …

DAPPLE: A pipelined data parallel approach for training large models

S Fan, Y Rong, C Meng, Z Cao, S Wang… - Proceedings of the 26th …, 2021 - dl.acm.org
It is a challenging task to train large DNN models on sophisticated GPU platforms with
diversified interconnect capabilities. Recently, pipelined training has been proposed as an …