Sustainable ai: Environmental implications, challenges and opportunities

CJ Wu, R Raghavendra, U Gupta… - Proceedings of …, 2022 - proceedings.mlsys.org
This paper explores the environmental impact of the super-linear growth trends for AI from a
holistic perspective, spanning Data, Algorithms, and System Hardware. We characterize the …

A metaverse: Taxonomy, components, applications, and open challenges

SM Park, YG Kim - IEEE access, 2022 - ieeexplore.ieee.org
Unlike previous studies on the Metaverse based on Second Life, the current Metaverse is
based on the social value of Generation Z that online and offline selves are not different …

Cores that don't count

PH Hochschild, P Turner, JC Mogul… - Proceedings of the …, 2021 - dl.acm.org
We are accustomed to thinking of computers as fail-stop, especially the cores that execute
instructions, and most system software implicitly relies on that assumption. During most of …

Rocksdb: Evolution of development priorities in a key-value store serving large-scale applications

S Dong, A Kryczka, Y Jin, M Stumm - ACM Transactions on Storage (TOS …, 2021 - dl.acm.org
This article is an eight-year retrospective on development priorities for RocksDB, a key-value
store developed at Facebook that targets large-scale distributed systems and that is …

Understanding silent data corruptions in a large production cpu population

S Wang, G Zhang, J Wei, Y Wang, J Wu… - Proceedings of the 29th …, 2023 - dl.acm.org
Silent Data Corruption (SDC) in processors can lead to various application-level issues,
such as incorrect calculations and even data loss. Since traditional techniques are not …

Moesi-prime: preventing coherence-induced hammering in commodity workloads

K Loughlin, S Saroiu, A Wolman, YA Manerkar… - Proceedings of the 49th …, 2022 - dl.acm.org
Prior work shows that Rowhammer attacks---which flip bits in DRAM via frequent activations
of the same row (s)---are viable. Adversaries typically mount these attacks via instruction …

Understanding and mitigating hardware failures in deep learning training systems

Y He, M Hutton, S Chan, R De Gruijl… - Proceedings of the 50th …, 2023 - dl.acm.org
Deep neural network (DNN) training workloads are increasingly susceptible to hardware
failures in datacenters. For example, Google experienced" mysterious, difficult to identify …

Silent data corruptions: Microarchitectural perspectives

G Papadimitriou, D Gizopoulos - IEEE Transactions on …, 2023 - ieeexplore.ieee.org
Today more than ever before, academia, manufacturers, and hyperscalers acknowledge the
major challenge of silent data corruptions (SDCs) and aim on solutions to minimize its …

Impact of voltage scaling on soft errors susceptibility of multicore server cpus

D Agiakatsikas, G Papadimitriou, V Karakostas… - Proceedings of the 56th …, 2023 - dl.acm.org
Microprocessor power consumption and dependability are both crucial challenges that
designers have to cope with due to shrinking feature sizes and increasing transistor counts …

SmartOClock: Workload-and risk-aware overclocking in the cloud

J Stojkovic, PA Misra, Í Goiri, S Whitlock… - 2024 ACM/IEEE 51st …, 2024 - ieeexplore.ieee.org
Operating server components beyond their voltage and power design limit (ie, overclocking)
enables improving performance and lowering cost for cloud workloads. However …