Outerspace: An outer product based sparse matrix multiplication accelerator

S Pal, J Beaumont, DH Park… - … Symposium on High …, 2018 - ieeexplore.ieee.org
Sparse matrices are widely used in graph and data analytics, machine learning, engineering
and scientific applications. This paper describes and analyzes OuterSPACE, an accelerator …

Syncron: Efficient synchronization support for near-data-processing architectures

C Giannoula, N Vijaykumar… - … Symposium on High …, 2021 - ieeexplore.ieee.org
Near-Data-Processing (NDP) architectures present a promising way to alleviate data
movement costs and can provide significant performance and energy benefits to parallel …

DeNovo: Rethinking the memory hierarchy for disciplined parallelism

B Choi, R Komuravelli, H Sung… - 2011 International …, 2011 - ieeexplore.ieee.org
For parallelism to become tractable for mass programmers, shared-memory languages and
environments must evolve to enforce disciplined practices that ban" wild shared-memory …

Complexity-effective multicore coherence

A Ros, S Kaxiras - Proceedings of the 21st international conference on …, 2012 - dl.acm.org
Much of the complexity and overhead (directory, state bits, invalidations) of a typical
directory coherence implementation stems from the effort to make it" invisible" even to the …

System and method for simplifying cache coherence using multiple write policies

S Kaxiras, A Ros - US Patent 9,274,960, 2016 - Google Patents
Abstract System and methods for cache coherence in a multi-core processing environment
having a local/shared cache hierarchy. The system includes multiple processor cores, a …

A new perspective for efficient virtual-cache coherence

S Kaxiras, A Ros - Proceedings of the 40th Annual International …, 2013 - dl.acm.org
Coherent shared virtual memory (cSVM) is highly coveted for heterogeneous architectures
as it will simplify programming across different cores and manycore accelerators. In this …

Abstract machine models and proxy architectures for exascale computing

JA Ang, RF Barrett, RE Benner, D Burke… - … -Software Co-Design …, 2014 - ieeexplore.ieee.org
To achieve exascale computing, fundamental hardware architectures must change. This will
significantly impact scientific applications that run on current high performance computing …

TSO-CC: Consistency directed cache coherence for TSO

M Elver, V Nagarajan - 2014 IEEE 20th International …, 2014 - ieeexplore.ieee.org
Traditional directory coherence protocols are designed for the strictest consistency model,
sequential consistency (SC). When they are used for chip multiprocessors (CMPs) that …

Turning centralized coherence and distributed critical-section execution on their head: A new approach for scalable distributed shared memory

S Kaxiras, D Klaftenegger, M Norgren, A Ros… - Proceedings of the 24th …, 2015 - dl.acm.org
A coherent global address space in a distributed system enables shared memory
programming in a much larger scale than a single multicore or a single SMP. Without …

DeNovoND: Efficient hardware support for disciplined non-determinism

H Sung, R Komuravelli, SV Adve - ACM SIGPLAN Notices, 2013 - dl.acm.org
Recent work has shown that disciplined shared-memory programming models that provide
deterministic-by-default semantics can simplify both parallel software and hardware …