Syncron: Efficient synchronization support for near-data-processing architectures

C Giannoula, N Vijaykumar… - … Symposium on High …, 2021 - ieeexplore.ieee.org
Near-Data-Processing (NDP) architectures present a promising way to alleviate data
movement costs and can provide significant performance and energy benefits to parallel …

Co-designing accelerators and SoC interfaces using gem5-Aladdin

YS Shao, SL Xi, V Srinivasan, GY Wei… - 2016 49th Annual …, 2016 - ieeexplore.ieee.org
Increasing demand for power-efficient, high-performance computing has spurred a growing
number and diversity of hardware accelerators in mobile and server Systems on Chip …

Heterogeneous system coherence for integrated CPU-GPU systems

J Power, A Basu, J Gu, S Puthoor… - Proceedings of the 46th …, 2013 - dl.acm.org
Many future heterogeneous systems will integrate CPUs and GPUs physically on a single
chip and logically connect them via shared memory to avoid explicit data copying. Making …

Cache coherence for GPU architectures

I Singh, A Shriraman, WWL Fung… - 2013 IEEE 19th …, 2013 - ieeexplore.ieee.org
While scalable coherence has been extensively studied in the context of general purpose
chip multiprocessors (CMPs), GPU architectures present a new set of challenges …

Throughput-effective on-chip networks for manycore accelerators

A Bakhoda, J Kim, TM Aamodt - 2010 43rd Annual IEEE/ACM …, 2010 - ieeexplore.ieee.org
As the number of cores and threads in manycore compute accelerators such as Graphics
Processing Units (GPU) increases, so does the importance of on-chip interconnection …

Heterogeneous-race-free memory models

DR Hower, BA Hechtman, BM Beckmann… - Proceedings of the 19th …, 2014 - dl.acm.org
Commodity heterogeneous systems (eg, integrated CPUs and GPUs), now support a
unified, shared memory address space for all components. Because the latency of global …

Cohmeleon: Learning-based orchestration of accelerator coherence in heterogeneous SoCs

J Zuckerman, D Giri, J Kwon, P Mantovani… - MICRO-54: 54th Annual …, 2021 - dl.acm.org
One of the most critical aspects of integrating loosely-coupled accelerators in
heterogeneous SoC architectures is orchestrating their interactions with the memory …

DeNovoND: Efficient hardware support for disciplined non-determinism

H Sung, R Komuravelli, SV Adve - ACM SIGPLAN Notices, 2013 - dl.acm.org
Recent work has shown that disciplined shared-memory programming models that provide
deterministic-by-default semantics can simplify both parallel software and hardware …

ArMOR: Defending against memory consistency model mismatches in heterogeneous architectures

D Lustig, C Trippel, M Pellauer… - Proceedings of the 42nd …, 2015 - dl.acm.org
Architectural heterogeneity is increasing: numerous products and studies have proven the
benefits of combining cores and accelerators with varying ISAs into a single system …

Exploring memory consistency for massively-threaded throughput-oriented processors

BA Hechtman, DJ Sorin - Proceedings of the 40th Annual International …, 2013 - dl.acm.org
We re-visit the issue of hardware consistency models in the new context of massively-
threaded throughput-oriented processors (MTTOPs). A prominent example of an MTTOP is a …