AI accelerator on IBM Telum processor: Industrial product

C Lichtenau, A Buyuktosunoglu, R Bertran… - Proceedings of the 49th …, 2022 - dl.acm.org
IBM Telum is the next generation processor chip for IBM Z and LinuxONE systems. The
Telum design is focused on enterprise class workloads and it achieves over 40% per socket …

REED: Chiplet-based accelerator for fully homomorphic encryption

A Aikata, AC Mert, S Kwon, M Deryabin… - Cryptology ePrint …, 2023 - eprint.iacr.org
Abstract Fully Homomorphic Encryption (FHE) enables privacy-preserving computation and
has many applications. However, its practical implementation faces massive computation …

Enhancing interconnection network topology for chiplet-based systems: An automated design framework

Z Cao, Q Liu, Z Wan, W Zhang, K Song, W Liu - Future Generation …, 2025 - Elsevier
Chiplet-based systems integrate discrete chips on an interposer and use the interconnection
network to enable communication between different components. The topology of the …

HSAS: Efficient task scheduling for large scale heterogeneous systolic array accelerator cluster

K Yan, Y Song, T Liu, J Tan, X Wei, X Fu - Future Generation Computer …, 2024 - Elsevier
To efficiently process a large amount of deep neural network models can be challenging,
due to significant differences among models and even layers. Nowadays, systolic array has …

Exploring Memory-Oriented Design Optimization of Edge-AI Hardware for Extended Reality Applications

V Parmar, SS Sarwar, Z Li, HHS Lee, B De Salvo… - IEEE Micro, 2023 - ieeexplore.ieee.org
Low-power edge AI capabilities are essential for on-device extended reality (XR)
applications to support the vision of the metaverse. In this work, we investigate two …

Arax: a runtime framework for decoupling applications from heterogeneous accelerators

M Pavlidakis, S Mavridis, A Chazapis… - Proceedings of the 13th …, 2022 - dl.acm.org
Today, using multiple heterogeneous accelerators efficiently from applications and high-
level frameworks, such as Tensor-Flow and Caffe, poses significant challenges in three …

LCM: LLM-focused Hybrid SPM-cache Architecture with Cache Management for Multi-Core AI Accelerators

C Lai, Z Zhou, A Poptani, W Zhang - Proceedings of the 38th ACM …, 2024 - dl.acm.org
The proliferation of large language models (LLMs) with substantial computational
requirements and memory footprints has necessitated the design of more capable AI …

HeterGenMap: An Evolutionary Mapping Framework for Heterogeneous NoC-based Neuromorphic Systems

KN Dang, NAV Doan, ND Nguyen, AB Abdallah - IEEE Access, 2023 - ieeexplore.ieee.org
While task mapping for multi-core systems is known as an NP-hard problem, mapping for
neuromorphic systems even scale it up due to a high number of neurons per core and a high …

On Task Mapping in Multi-chiplet Based Many-core Systems to Optimize Inter-and Intra-chiplet Communications

X Wang, Y Wang, Y Jiang… - IEEE Transactions on …, 2024 - ieeexplore.ieee.org
Multi-chiplet system design, by integrating multiple chiplets/dielets within a single package,
has emerged as a promising paradigm in the post-Moore era. This paper introduces a novel …

[HTML][HTML] A Low-Power General Matrix Multiplication Accelerator with Sparse Weight-and-Output Stationary Dataflow

P Liu, Y Wang - Micromachines, 2025 - mdpi.com
General matrix multiplication (GEMM) in machine learning involves massive computation
and data movement, which restricts its deployment on resource-constrained devices …