Sustainable ai: Environmental implications, challenges and opportunities

CJ Wu, R Raghavendra, U Gupta… - Proceedings of …, 2022 - proceedings.mlsys.org
This paper explores the environmental impact of the super-linear growth trends for AI from a
holistic perspective, spanning Data, Algorithms, and System Hardware. We characterize the …

A Survey of Design and Optimization for Systolic Array-based DNN Accelerators

R Xu, S Ma, Y Guo, D Li - ACM Computing Surveys, 2023 - dl.acm.org
In recent years, it has been witnessed that the systolic array is a successful architecture for
DNN hardware accelerators. However, the design of systolic arrays also encountered many …

Spatten: Efficient sparse attention architecture with cascade token and head pruning

H Wang, Z Zhang, S Han - 2021 IEEE International Symposium …, 2021 - ieeexplore.ieee.org
The attention mechanism is becoming increasingly popular in Natural Language Processing
(NLP) applications, showing superior performance than convolutional and recurrent …

The architectural implications of facebook's dnn-based personalized recommendation

U Gupta, CJ Wu, X Wang, M Naumov… - … Symposium on High …, 2020 - ieeexplore.ieee.org
The widespread application of deep learning has changed the landscape of computation in
data centers. In particular, personalized recommendation for content ranking is now largely …

DAMOV: A new methodology and benchmark suite for evaluating data movement bottlenecks

GF Oliveira, J Gómez-Luna, L Orosa, S Ghose… - IEEE …, 2021 - ieeexplore.ieee.org
Data movement between the CPU and main memory is a first-order obstacle against improv
ing performance, scalability, and energy efficiency in modern systems. Computer systems …

Google neural network models for edge devices: Analyzing and mitigating machine learning inference bottlenecks

A Boroumand, S Ghose, B Akin… - 2021 30th …, 2021 - ieeexplore.ieee.org
Emerging edge computing platforms often contain machine learning (ML) accelerators that
can accelerate inference for a wide range of neural network (NN) models. These models are …

RecSSD: near data processing for solid state drive based recommendation inference

M Wilkening, U Gupta, S Hsia, C Trippel… - Proceedings of the 26th …, 2021 - dl.acm.org
Neural personalized recommendation models are used across a wide variety of datacenter
applications including search, social media, and entertainment. State-of-the-art models …

Understanding training efficiency of deep learning recommendation models at scale

B Acun, M Murphy, X Wang, J Nie… - … Symposium on High …, 2021 - ieeexplore.ieee.org
The use of GPUs has proliferated for machine learning workflows and is now considered
mainstream for many deep learning models. Meanwhile, when training state-of-the-art …

Near-memory processing in action: Accelerating personalized recommendation with axdimm

L Ke, X Zhang, J So, JG Lee, SH Kang, S Lee… - IEEE Micro, 2021 - ieeexplore.ieee.org
Near-memory processing (NMP) is a prospective paradigm enabling memory-centric
computing. By moving the compute capability next to the main memory (DRAM modules), it …

Fafnir: Accelerating sparse gathering by using efficient near-memory intelligent reduction

B Asgari, R Hadidi, J Cao, SK Lim… - 2021 IEEE International …, 2021 - ieeexplore.ieee.org
Memory-bound sparse gathering, caused by irregular random memory accesses, has
become an obstacle in several on-demand applications such as embedding lookup in …