Craterlake: a hardware accelerator for efficient unbounded computation on encrypted data

N Samardzic, A Feldmann, A Krastev… - Proceedings of the 49th …, 2022 - dl.acm.org
Fully Homomorphic Encryption (FHE) enables offloading computation to untrusted servers
with cryptographic privacy. Despite its attractive security, FHE is not yet widely adopted due …

Full stack optimization of transformer inference: a survey

S Kim, C Hooper, T Wattanawong, M Kang… - arXiv preprint arXiv …, 2023 - arxiv.org
Recent advances in state-of-the-art DNN architecture design have been moving toward
Transformer models. These models achieve superior accuracy across a wide range of …

AMOS: enabling automatic mapping for tensor computations on spatial accelerators with hardware abstraction

S Zheng, R Chen, A Wei, Y Jin, Q Han, L Lu… - Proceedings of the 49th …, 2022 - dl.acm.org
Hardware specialization is a promising trend to sustain performance growth. Spatial
hardware accelerators that employ specialized and hierarchical computation and memory …

Sparseloop: An analytical approach to sparse tensor accelerator modeling

YN Wu, PA Tsai, A Parashar, V Sze… - 2022 55th IEEE/ACM …, 2022 - ieeexplore.ieee.org
In recent years, many accelerators have been proposed to efficiently process sparse tensor
algebra applications (eg, sparse neural networks). However, these proposals are single …

Tileflow: A framework for modeling fusion dataflow via tree-based analysis

S Zheng, S Chen, S Gao, L Jia, G Sun… - Proceedings of the 56th …, 2023 - dl.acm.org
With the increasing size of DNN models and the growing discrepancy between compute
performance and memory bandwidth, fusing multiple layers together to reduce off-chip …

Inter-layer scheduling space definition and exploration for tiled accelerators

J Cai, Y Wei, Z Wu, S Peng, K Ma - Proceedings of the 50th Annual …, 2023 - dl.acm.org
With the continuous expansion of the DNN accelerator scale, inter-layer scheduling, which
studies the allocation of computing resources to each layer and the computing order of all …

Chimera: An analytical optimizing framework for effective compute-intensive operators fusion

S Zheng, S Chen, P Song, R Chen, X Li… - … Symposium on High …, 2023 - ieeexplore.ieee.org
Machine learning models with various tensor operators are becoming ubiquitous in recent
years. There are two types of operators in machine learning: compute-intensive operators …

Dosa: Differentiable model-based one-loop search for dnn accelerators

C Hong, Q Huang, G Dinh, M Subedar… - Proceedings of the 56th …, 2023 - dl.acm.org
In the hardware design space exploration process, it is critical to optimize both hardware
parameters and algorithm-to-hardware mappings. Previous work has largely approached …

Magma: An optimization framework for mapping multiple dnns on multiple accelerator cores

SC Kao, T Krishna - 2022 IEEE International Symposium on …, 2022 - ieeexplore.ieee.org
As Deep Learning continues to drive a variety of applications in edge and cloud data
centers, there is a growing trend towards building large accelerators with several sub …

Teaal: A declarative framework for modeling sparse tensor accelerators

N Nayak, TO Odemuyiwa, S Ugare, C Fletcher… - Proceedings of the 56th …, 2023 - dl.acm.org
Over the past few years, the explosion in sparse tensor algebra workloads has led to a
corresponding rise in domain-specific accelerators to service them. Due to the irregularity …