Simba: scaling deep-learning inference with chiplet-based architecture

C Lichtenau, A Buyuktosunoglu, R Bertran… - Proceedings of the 49th …, 2022 - dl.acm.org

IBM Telum is the next generation processor chip for IBM Z and LinuxONE systems. The
Telum design is focused on enterprise class workloads and it achieves over 40% per socket …

被引用次数：15 相关文章所有 2 个版本网页快照

[PDF] sci-hub [PDF] iacr.org [ 下载加速 ]

REED: Chiplet-based accelerator for fully homomorphic encryption

A Aikata, AC Mert, S Kwon, M Deryabin… - Cryptology ePrint …, 2023 - eprint.iacr.org

Abstract Fully Homomorphic Encryption (FHE) enables privacy-preserving computation and
has many applications. However, its practical implementation faces massive computation …

被引用次数：7 相关文章所有 4 个版本网页快照

[PDF] sci-hub

Enhancing interconnection network topology for chiplet-based systems: An automated design framework

Z Cao, Q Liu, Z Wan, W Zhang, K Song, W Liu - Future Generation …, 2025 - Elsevier

Chiplet-based systems integrate discrete chips on an interposer and use the interconnection
network to enable communication between different components. The topology of the …

HSAS: Efficient task scheduling for large scale heterogeneous systolic array accelerator cluster

K Yan, Y Song, T Liu, J Tan, X Wei, X Fu - Future Generation Computer …, 2024 - Elsevier

To efficiently process a large amount of deep neural network models can be challenging,
due to significant differences among models and even layers. Nowadays, systolic array has …

被引用次数：2 相关文章网页快照

[PDF] sci-hub

Exploring Memory-Oriented Design Optimization of Edge-AI Hardware for Extended Reality Applications

V Parmar, SS Sarwar, Z Li, HHS Lee, B De Salvo… - IEEE Micro, 2023 - ieeexplore.ieee.org

Low-power edge AI capabilities are essential for on-device extended reality (XR)
applications to support the vision of the metaverse. In this work, we investigate two …

被引用次数：6 相关文章所有 4 个版本网页快照

[PDF] sci-hub [PDF] arxiv.org [ 下载加速 ]

Arax: a runtime framework for decoupling applications from heterogeneous accelerators

M Pavlidakis, S Mavridis, A Chazapis… - Proceedings of the 13th …, 2022 - dl.acm.org

Today, using multiple heterogeneous accelerators efficiently from applications and high-
level frameworks, such as Tensor-Flow and Caffe, poses significant challenges in three …

被引用次数：5 相关文章所有 7 个版本网页快照

[PDF] sci-hub

LCM: LLM-focused Hybrid SPM-cache Architecture with Cache Management for Multi-Core AI Accelerators

C Lai, Z Zhou, A Poptani, W Zhang - Proceedings of the 38th ACM …, 2024 - dl.acm.org

The proliferation of large language models (LLMs) with substantial computational
requirements and memory footprints has necessitated the design of more capable AI …

被引用次数：6 相关文章网页快照

[PDF] sci-hub [PDF] ieee.org [ 下载加速 ]

HeterGenMap: An Evolutionary Mapping Framework for Heterogeneous NoC-based Neuromorphic Systems

KN Dang, NAV Doan, ND Nguyen, AB Abdallah - IEEE Access, 2023 - ieeexplore.ieee.org

While task mapping for multi-core systems is known as an NP-hard problem, mapping for
neuromorphic systems even scale it up due to a high number of neurons per core and a high …

被引用次数：1 相关文章所有 2 个版本网页快照

[PDF] sci-hub [PDF] essex.ac.uk [ 下载加速 ]

On Task Mapping in Multi-chiplet Based Many-core Systems to Optimize Inter-and Intra-chiplet Communications

X Wang, Y Wang, Y Jiang… - IEEE Transactions on …, 2024 - ieeexplore.ieee.org

Multi-chiplet system design, by integrating multiple chiplets/dielets within a single package,
has emerged as a promising paradigm in the post-Moore era. This paper introduces a novel …

相关文章所有 5 个版本网页快照

[PDF] sci-hub [HTML] mdpi.com [ 下载加速 ]

[HTML][HTML] A Low-Power General Matrix Multiplication Accelerator with Sparse Weight-and-Output Stationary Dataflow

P Liu, Y Wang - Micromachines, 2025 - mdpi.com

General matrix multiplication (GEMM) in machine learning involves massive computation
and data movement, which restricts its deployment on resource-constrained devices …

相关文章所有 3 个版本网页快照

高级搜索

QQ 群