AI accelerator on IBM Telum processor: Industrial product
C Lichtenau, A Buyuktosunoglu, R Bertran… - Proceedings of the 49th …, 2022 - dl.acm.org
IBM Telum is the next generation processor chip for IBM Z and LinuxONE systems. The
Telum design is focused on enterprise class workloads and it achieves over 40% per socket …
Telum design is focused on enterprise class workloads and it achieves over 40% per socket …
REED: Chiplet-based accelerator for fully homomorphic encryption
Abstract Fully Homomorphic Encryption (FHE) enables privacy-preserving computation and
has many applications. However, its practical implementation faces massive computation …
has many applications. However, its practical implementation faces massive computation …
Enhancing interconnection network topology for chiplet-based systems: An automated design framework
Z Cao, Q Liu, Z Wan, W Zhang, K Song, W Liu - Future Generation …, 2025 - Elsevier
Chiplet-based systems integrate discrete chips on an interposer and use the interconnection
network to enable communication between different components. The topology of the …
network to enable communication between different components. The topology of the …
HSAS: Efficient task scheduling for large scale heterogeneous systolic array accelerator cluster
K Yan, Y Song, T Liu, J Tan, X Wei, X Fu - Future Generation Computer …, 2024 - Elsevier
To efficiently process a large amount of deep neural network models can be challenging,
due to significant differences among models and even layers. Nowadays, systolic array has …
due to significant differences among models and even layers. Nowadays, systolic array has …
Exploring Memory-Oriented Design Optimization of Edge-AI Hardware for Extended Reality Applications
Low-power edge AI capabilities are essential for on-device extended reality (XR)
applications to support the vision of the metaverse. In this work, we investigate two …
applications to support the vision of the metaverse. In this work, we investigate two …
Arax: a runtime framework for decoupling applications from heterogeneous accelerators
M Pavlidakis, S Mavridis, A Chazapis… - Proceedings of the 13th …, 2022 - dl.acm.org
Today, using multiple heterogeneous accelerators efficiently from applications and high-
level frameworks, such as Tensor-Flow and Caffe, poses significant challenges in three …
level frameworks, such as Tensor-Flow and Caffe, poses significant challenges in three …
LCM: LLM-focused Hybrid SPM-cache Architecture with Cache Management for Multi-Core AI Accelerators
The proliferation of large language models (LLMs) with substantial computational
requirements and memory footprints has necessitated the design of more capable AI …
requirements and memory footprints has necessitated the design of more capable AI …
HeterGenMap: An Evolutionary Mapping Framework for Heterogeneous NoC-based Neuromorphic Systems
While task mapping for multi-core systems is known as an NP-hard problem, mapping for
neuromorphic systems even scale it up due to a high number of neurons per core and a high …
neuromorphic systems even scale it up due to a high number of neurons per core and a high …
On Task Mapping in Multi-chiplet Based Many-core Systems to Optimize Inter-and Intra-chiplet Communications
X Wang, Y Wang, Y Jiang… - IEEE Transactions on …, 2024 - ieeexplore.ieee.org
Multi-chiplet system design, by integrating multiple chiplets/dielets within a single package,
has emerged as a promising paradigm in the post-Moore era. This paper introduces a novel …
has emerged as a promising paradigm in the post-Moore era. This paper introduces a novel …
[HTML][HTML] A Low-Power General Matrix Multiplication Accelerator with Sparse Weight-and-Output Stationary Dataflow
P Liu, Y Wang - Micromachines, 2025 - mdpi.com
General matrix multiplication (GEMM) in machine learning involves massive computation
and data movement, which restricts its deployment on resource-constrained devices …
and data movement, which restricts its deployment on resource-constrained devices …