On the opportunities and risks of foundation models

R Bommasani, DA Hudson, E Adeli, R Altman… - arXiv preprint arXiv …, 2021 - arxiv.org
AI is undergoing a paradigm shift with the rise of models (eg, BERT, DALL-E, GPT-3) that are
trained on broad data at scale and are adaptable to a wide range of downstream tasks. We …

Pytorch fsdp: experiences on scaling fully sharded data parallel

Y Zhao, A Gu, R Varma, L Luo, CC Huang, M Xu… - arXiv preprint arXiv …, 2023 - arxiv.org
It is widely acknowledged that large models have the potential to deliver superior
performance across a broad range of domains. Despite the remarkable progress made in …

Random Offset Block Embedding (ROBE) for compressed embedding tables in deep learning recommendation systems

A Desai, L Chou, A Shrivastava - Proceedings of Machine …, 2022 - proceedings.mlsys.org
Deep learning for recommendation data is one of the most pervasive and challenging AI
workload in recent times. State-of-the-art recommendation models are one of the largest …

Training personalized recommendation systems from (GPU) scratch: Look forward not backwards

Y Kwon, M Rhu - Proceedings of the 49th Annual International …, 2022 - dl.acm.org
Personalized recommendation models (RecSys) are one of the most popular machine
learning workload serviced by hyperscalers. A critical challenge of training RecSys is its …

A survey on auto-parallelism of large-scale deep learning training

P Liang, Y Tang, X Zhang, Y Bai, T Su… - … on Parallel and …, 2023 - ieeexplore.ieee.org
Deep learning (DL) has gained great success in recent years, leading to state-of-the-art
performance in research community and industrial fields like computer vision and natural …

Enabling compute-communication overlap in distributed deep learning training platforms

S Rashidi, M Denton, S Sridharan… - 2021 ACM/IEEE 48th …, 2021 - ieeexplore.ieee.org
Deep Learning (DL) training platforms are built by interconnecting multiple DL accelerators
(eg, GPU/TPU) via fast, customized interconnects with 100s of gigabytes (GBs) of bandwidth …

Persia: An open, hybrid system scaling deep learning-based recommenders up to 100 trillion parameters

X Lian, B Yuan, X Zhu, Y Wang, Y He, H Wu… - Proceedings of the 28th …, 2022 - dl.acm.org
Recent years have witnessed an exponential growth of model scale in deep learning-based
recommender systems---from Google's 2016 model with 1 billion parameters to the latest …

The trade-offs of model size in large recommendation models: 100GB to 10MB Criteo-tb DLRM model

A Desai, A Shrivastava - Advances in Neural Information …, 2022 - proceedings.neurips.cc
Embedding tables dominate industrial-scale recommendation model sizes, using up to
terabytes of memory. A popular and the largest publicly available machine learning MLPerf …

EmbedX: A Versatile, Efficient and Scalable Platform to Embed Both Graphs and High-Dimensional Sparse Data

Y Zou, Z Ding, J Shi, S Guo, C Su, Y Zhang - Proceedings of the VLDB …, 2023 - dl.acm.org
In modern online services, it is of growing importance to process web-scale graph data and
high-dimensional sparse data together into embeddings for downstream tasks, such as …

[PDF][PDF] Persia: a hybrid system scaling deep learning based recommenders up to 100 trillion parameters

X Lian, B Yuan, X Zhu, Y Wang, Y He… - arXiv preprint arXiv …, 2021 - ask.qcloudimg.com
Deep learning based models have dominated the current landscape of production
recommender systems. Furthermore, recent years have witnessed an exponential growth of …