{OPER}:{Optimality-Guided} Embedding Table Parallelization for Large-scale Recommendation Model

Z Wang, Y Wang, B Feng, G Huang… - 2024 USENIX Annual …, 2024 - usenix.org
The deployment of Deep Learning Recommendation Models (DLRMs) involves the
parallelization of extra-large embedding tables (EMTs) on multiple GPUs. Existing works …

Heterogeneous acceleration pipeline for recommendation system training

M Adnan, YE Maboud, D Mahajan… - 2024 ACM/IEEE 51st …, 2024 - ieeexplore.ieee.org
Recommendation models rely on deep learning networks and large embedding tables,
resulting in computationally and memory-intensive processes. These models are typically …

Scalability Limitations of Processing-in-Memory using Real System Evaluations

G Jonatan, H Cho, H Son, X Wu, N Livesay… - Proceedings of the …, 2024 - dl.acm.org
Processing-in-memory (PIM), where the compute is moved closer to the memory or the data,
has been widely explored to accelerate emerging workloads. Recently, different PIM-based …

Enabling efficient large recommendation model training with near cxl memory processing

H Liu, L Zheng, Y Huang, J Zhou, C Liu… - 2024 ACM/IEEE 51st …, 2024 - ieeexplore.ieee.org
Personalized recommendation systems have become one of the most important Internet
services nowadays. A critical challenge of training and deploying the recommendation …

RecFlex: Enabling Feature Heterogeneity-Aware Optimization for Deep Recommendation Models with Flexible Schedules

Z Pan, Z Zheng, F Zhang, B Xie, R Wu… - … Conference for High …, 2024 - ieeexplore.ieee.org
Industrial recommendation models typically involve numerous feature fields. The embedding
computation workloads are heterogeneous across these fields, thus requiring varied optimal …

Revisiting multi-dimensional classification from a dimension-wise perspective

Y Shi, H Ye, D Man, X Han, D Zhan, Y Jiang - Frontiers of Computer …, 2025 - Springer
Real-world objects exhibit intricate semantic properties that can be characterized from a
multitude of perspectives, which necessitates the development of a model capable of …

Accelerating Distributed DLRM Training with Optimized TT Decomposition and Micro-Batching

W Wang, Y Xia, D Yang, X Zhou… - … Conference for High …, 2024 - ieeexplore.ieee.org
Deep Learning Recommendation Models (DLRMs) are pivotal in various sectors, yet they
are hindered by the high memory demands of embedding tables and the significant …

Scalable Machine Learning Training Infrastructure for Online Ads Recommendation and Auction Scoring Modeling at Google

G Kurian, S Sardashti, R Sims, F Berger, G Holt… - arXiv preprint arXiv …, 2025 - arxiv.org
Large-scale Ads recommendation and auction scoring models at Google scale demand
immense computational resources. While specialized hardware like TPUs have improved …

Accelerating Communication in Deep Learning Recommendation Model Training with Dual-Level Adaptive Lossy Compression

H Feng, B Zhang, F Ye, M Si, CH Chu… - … Conference for High …, 2024 - ieeexplore.ieee.org
DLRM is a state-of-the-art recommendation system model that has gained widespread
adoption across various industry applications. The large size of DLRM models, however …

Rec-PF: Data-Driven Large-Scale Deep Learning Recommendation Model Training Optimization Based on Tensor-Train Embedding Table With Photovoltaic Forecast

Y Li, Z Wang, C Ren, X Hou… - IEEE Transactions on …, 2024 - ieeexplore.ieee.org
Photovoltaic (PV) power forecasting is important for promoting the integration of renewable
energy sources. However, neural network-based methods, particularly deep learning for PV …