Stream-based memory access specialization for general purpose processors

Z Wang, T Nowatzki - Proceedings of the 46th International Symposium …, 2019 - dl.acm.org
Because of severe limitations in technology scaling, architects have innovated in
specializing general purpose processors for computation primitives (eg vector instructions …

Unlimited vector extension with data streaming support

JM Domingos, N Neves, N Roma… - 2021 ACM/IEEE 48th …, 2021 - ieeexplore.ieee.org
Unlimited vector extension (UVE) is a novel instruction set architecture extension that takes
streaming and SIMD processing together into the modern computing scenario. It aims to …

DPU-v2: Energy-efficient execution of irregular directed acyclic graphs

N Shah, W Meert, M Verhelst - 2022 55th IEEE/ACM …, 2022 - ieeexplore.ieee.org
A growing number of applications like probabilistic machine learning, sparse linear algebra,
robotic navigation, etc., exhibit irregular data flow computation that can be modeled with …

Novia: A framework for discovering non-conventional inline accelerators

D Trilla, JD Wellman, A Buyuktosunoglu… - MICRO-54: 54th Annual …, 2021 - dl.acm.org
Accelerators provide an increasingly valuable source of performance in modern computing
systems. In most cases, accelerators are implemented as stand-alone, offload engines to …

Deepframe: A profile-driven compiler for spatial hardware accelerators

A Guha, N Vedula, A Shriraman - 2019 28th International …, 2019 - ieeexplore.ieee.org
Tracing code paths to form extended basic blocks is useful in many areas, compiler
optimizations [1], improving instruction cache behavior [2] and custom-hardware offloading …

Characterizing diverse handheld apps for customized hardware acceleration

PV Rengasamy, H Zhang… - 2017 IEEE …, 2017 - ieeexplore.ieee.org
Current handhelds incorporate a variety of acceler-ators/IPs for improving their performance
and energy efficiency. While these IPs are extremely useful for accelerating parts of a …

Mirage cores: The illusion of many out-of-order cores using in-order hardware

S Padmanabha, A Lukefahr, R Das… - Proceedings of the 50th …, 2017 - dl.acm.org
Heterogenous chip multiprocessors (Het-CMPs) offer a combination of large Out-of-Order
(OoO) cores optimized for high single-threaded performance and small In-Order (InO) cores …

[图书][B] Efficient Execution of Irregular Dataflow Graphs: Hardware/Software Co-optimization for Probabilistic AI and Sparse Linear Algebra

N Shah, W Meert, M Verhelst - 2023 - books.google.com
This book focuses on the acceleration of emerging irregular sparse workloads, posed by
novel artificial intelligent (AI) models and sparse linear algebra. Specifically, the book …

Decentralized offload-based execution on memory-centric compute cores

S Baskaran, J Sampson - … of the International Symposium on Memory …, 2020 - dl.acm.org
With the end of Dennard scaling, power constraints have led to increasing compute
specialization in the form of differently specialized accelerators integrated at various levels …

Nachos: Software-driven hardware-assisted memory disambiguation for accelerators

N Vedula, A Shriraman, S Kumar… - … Symposium on High …, 2018 - ieeexplore.ieee.org
Hardware accelerators have relied on the compiler to extract instruction parallelism but may
waste significant energy in enforcing memory ordering and discovering memory parallelism …