High-performance, distributed training of large-scale deep learning recommendation models

R Bommasani, DA Hudson, E Adeli, R Altman… - arXiv preprint arXiv …, 2021 - arxiv.org

AI is undergoing a paradigm shift with the rise of models (eg, BERT, DALL-E, GPT-3) that are
trained on broad data at scale and are adaptable to a wide range of downstream tasks. We …

被引用次数：3339 相关文章所有 2 个版本

[PDF] arxiv.org

Pytorch fsdp: experiences on scaling fully sharded data parallel

Y Zhao, A Gu, R Varma, L Luo, CC Huang, M Xu… - arXiv preprint arXiv …, 2023 - arxiv.org

It is widely acknowledged that large models have the potential to deliver superior
performance across a broad range of domains. Despite the remarkable progress made in …

被引用次数：107 相关文章所有 3 个版本

[PDF] mlsys.org

Random Offset Block Embedding (ROBE) for compressed embedding tables in deep learning recommendation systems

A Desai, L Chou, A Shrivastava - Proceedings of Machine …, 2022 - proceedings.mlsys.org

Deep learning for recommendation data is one of the most pervasive and challenging AI
workload in recent times. State-of-the-art recommendation models are one of the largest …

被引用次数：21 相关文章

[PDF] arxiv.org

Training personalized recommendation systems from (GPU) scratch: Look forward not backwards

Y Kwon, M Rhu - Proceedings of the 49th Annual International …, 2022 - dl.acm.org

Personalized recommendation models (RecSys) are one of the most popular machine
learning workload serviced by hyperscalers. A critical challenge of training RecSys is its …

被引用次数：22 相关文章所有 7 个版本

A survey on auto-parallelism of large-scale deep learning training

P Liang, Y Tang, X Zhang, Y Bai, T Su… - … on Parallel and …, 2023 - ieeexplore.ieee.org

Deep learning (DL) has gained great success in recent years, leading to state-of-the-art
performance in research community and industrial fields like computer vision and natural …

被引用次数：8 相关文章所有 3 个版本

[PDF] arxiv.org

Enabling compute-communication overlap in distributed deep learning training platforms

S Rashidi, M Denton, S Sridharan… - 2021 ACM/IEEE 48th …, 2021 - ieeexplore.ieee.org

Deep Learning (DL) training platforms are built by interconnecting multiple DL accelerators
(eg, GPU/TPU) via fast, customized interconnects with 100s of gigabytes (GBs) of bandwidth …

被引用次数：37 相关文章所有 6 个版本

[PDF] arxiv.org

Persia: An open, hybrid system scaling deep learning-based recommenders up to 100 trillion parameters

X Lian, B Yuan, X Zhu, Y Wang, Y He, H Wu… - Proceedings of the 28th …, 2022 - dl.acm.org

Recent years have witnessed an exponential growth of model scale in deep learning-based
recommender systems---from Google's 2016 model with 1 billion parameters to the latest …

被引用次数：15 相关文章所有 4 个版本

[PDF] neurips.cc

The trade-offs of model size in large recommendation models: 100GB to 10MB Criteo-tb DLRM model

A Desai, A Shrivastava - Advances in Neural Information …, 2022 - proceedings.neurips.cc

Embedding tables dominate industrial-scale recommendation model sizes, using up to
terabytes of memory. A popular and the largest publicly available machine learning MLPerf …

被引用次数：6 相关文章所有 3 个版本

EmbedX: A Versatile, Efficient and Scalable Platform to Embed Both Graphs and High-Dimensional Sparse Data

Y Zou, Z Ding, J Shi, S Guo, C Su, Y Zhang - Proceedings of the VLDB …, 2023 - dl.acm.org

In modern online services, it is of growing importance to process web-scale graph data and
high-dimensional sparse data together into embeddings for downstream tasks, such as …

被引用次数：1 相关文章

[PDF] qcloudimg.com

[PDF][PDF] Persia: a hybrid system scaling deep learning based recommenders up to 100 trillion parameters

X Lian, B Yuan, X Zhu, Y Wang, Y He… - arXiv preprint arXiv …, 2021 - ask.qcloudimg.com

Deep learning based models have dominated the current landscape of production
recommender systems. Furthermore, recent years have witnessed an exponential growth of …

被引用次数：12 相关文章

高级搜索

QQ 群