SHARP: A short-word hierarchical accelerator for robust and practical fully homomorphic encryption

J Kim, S Kim, J Choi, J Park, D Kim… - Proceedings of the 50th …, 2023 - dl.acm.org
Fully homomorphic encryption (FHE) is an emerging cryptographic technology that
guarantees the privacy of sensitive user data by enabling direct computations on encrypted …

Bingo spatial data prefetcher

M Bakhshalipour, M Shakerinava… - … Symposium on High …, 2019 - ieeexplore.ieee.org
Applications extensively use data objects with a regular and fixed layout, which leads to the
recurrence of access patterns over memory regions. Spatial data prefetching techniques …

Evaluation of hardware data prefetchers on server processors

M Bakhshalipour, S Tabaeiaghdaei… - ACM Computing …, 2019 - dl.acm.org
Data prefetching, ie, the act of predicting an application's future memory accesses and
fetching those that are not in the on-chip caches, is a well-known and widely used approach …

Gpu-nest: Characterizing energy efficiency of multi-gpu inference servers

A Jahanshahi, HZ Sabzi, C Lau… - IEEE Computer …, 2020 - ieeexplore.ieee.org
Cloud inference systems have recently emerged as a solution to the ever-increasing
integration of AI-powered applications into the smart devices around us. The wide adoption …

Enhancing server efficiency in the face of killer microseconds

A Mirhosseini, A Sriraman… - 2019 IEEE International …, 2019 - ieeexplore.ieee.org
We are entering an era of “killer microseconds” in data center applications. Killer
microseconds refer to μs-scale “holes” in CPU schedules caused by stalls to access fast I/O …

Blockmaestro: Enabling programmer-transparent task-based execution in gpu systems

AA Abdolrashidi, HA Esfeden… - 2021 ACM/IEEE 48th …, 2021 - ieeexplore.ieee.org
As modern GPU workloads grow in size and complexity, there is an ever-increasing demand
for GPU computational power. Emerging workloads contain hundreds or thousands of GPU …

BOW: Breathing operand windows to exploit bypassing in GPUs

HA Esfeden, A Abdolrashidi, S Rahman… - 2020 53rd Annual …, 2020 - ieeexplore.ieee.org
The Register File (RF) is a critical structure in Graphics Processing Units (GPUs) responsible
for a large portion of the area and power. To simplify the architecture of the RF, it is …

Ready: A fine-grained multithreading overlay framework for modern cpu-fpga dataflow applications

LBD Silva, R Ferreira, M Canesche… - ACM Transactions on …, 2019 - dl.acm.org
In this work, we propose a framework called REconfigurable Accelerator DeploY (READY),
the first framework to support polynomial runtime mapping of dataflow applications in high …

OSM: Off-chip shared memory for GPUs

S Darabi, E Yousefzadeh-Asl-Miandoab… - … on Parallel and …, 2022 - ieeexplore.ieee.org
Graphics Processing Units (GPUs) employ a shared memory, a software-managed cache for
programmers, in each streaming multiprocessor to accelerate data sharing among the …

High performance and power efficient accelerator for cloud inference

J Yao, H Zhou, Y Zhang, Y Li, C Feng… - … Symposium on High …, 2023 - ieeexplore.ieee.org
Facing the growing complexity of Deep Neural Networks (DNNs), high-performance and
power-efficient AI accelerators are desired to provide effective and affordable cloud …