Software-hardware co-design for fast and scalable training of deep learning recommendation models

D Mudigere, Y Hao, J Huang, Z Jia, A Tulloch… - Proceedings of the 49th …, 2022 - dl.acm.org
Deep learning recommendation models (DLRMs) have been used across many business-
critical services at Meta and are the single largest AI application in terms of infrastructure …

Trim: Enhancing processor-memory interfaces with scalable tensor reduction in memory

J Park, B Kim, S Yun, E Lee, M Rhu… - MICRO-54: 54th Annual …, 2021 - dl.acm.org
Personalized recommendation systems are gaining significant traction due to their industrial
importance. An important building block of recommendation systems consists of the …

Stratix 10 NX architecture and applications

M Langhammer, E Nurvitadhi, B Pasca… - The 2021 ACM/SIGDA …, 2021 - dl.acm.org
The advent of AI has driven the adoption of high density low precision arithmetic on FPGAs.
This has resulted in new methods in mapping both arithmetic functions as well as dataflows …

Cross-stack workload characterization of deep recommendation systems

S Hsia, U Gupta, M Wilkening, CJ Wu… - 2020 IEEE …, 2020 - ieeexplore.ieee.org
Deep learning based recommendation systems form the backbone of most personalized
cloud services. Though the computer architecture community has recently started to take …

Kairos: Building cost-efficient machine learning inference systems with heterogeneous cloud resources

B Li, S Samsi, V Gadepally, D Tiwari - Proceedings of the 32nd …, 2023 - dl.acm.org
Online inference is becoming a key service product for many businesses, deployed in cloud
platforms to meet customer demands. Despite their revenue-generation capability, these …

Heterogeneous acceleration pipeline for recommendation system training

M Adnan, YE Maboud, D Mahajan… - 2024 ACM/IEEE 51st …, 2024 - ieeexplore.ieee.org
Recommendation models rely on deep learning networks and large embedding tables,
resulting in computationally and memory-intensive processes. These models are typically …

Ribbon: cost-effective and qos-aware deep learning model inference using a diverse pool of cloud computing instances

B Li, RB Roy, T Patel, V Gadepally, K Gettings… - Proceedings of the …, 2021 - dl.acm.org
Deep learning model inference is a key service in many businesses and scientific discovery
processes. This paper introduces Ribbon, a novel deep learning inference serving system …

Tensor processing primitives: A programming abstraction for efficiency and portability in deep learning workloads

E Georganas, D Kalamkar, S Avancha… - Proceedings of the …, 2021 - dl.acm.org
During the past decade, novel Deep Learning (DL) algorithms/workloads and hardware
have been developed to tackle a wide range of problems. Despite the advances in …

Accelerating Personalized Recommendation with Cross-level Near-Memory Processing

H Liu, L Zheng, Y Huang, C Liu, X Ye, J Yuan… - Proceedings of the 50th …, 2023 - dl.acm.org
The memory-intensive embedding layers of the personalized recommendation systems are
the performance bottleneck as they demand large memory bandwidth and exhibit irregular …

The trade-offs of model size in large recommendation models: 100GB to 10MB Criteo-tb DLRM model

A Desai, A Shrivastava - Advances in Neural Information …, 2022 - proceedings.neurips.cc
Embedding tables dominate industrial-scale recommendation model sizes, using up to
terabytes of memory. A popular and the largest publicly available machine learning MLPerf …