A study on communication issues for systems-on-chip

CA Zeferino, ME Kreutz, L Carro… - … . 15th Symposium on …, 2002 - ieeexplore.ieee.org
Present days cores composing a system-on-chip might be interconnected by means of both
dedicated channels or shared buses. Nevertheless, future systems will have strong …

Compiler techniques for massively scalable implicit task parallelism

TG Armstrong, JM Wozniak, M Wilde… - SC'14: Proceedings of …, 2014 - ieeexplore.ieee.org
Swift/T is a high-level language for writing concise, deterministic scripts that compose serial
or parallel codes implemented in lower-level programming models into large-scale parallel …

Bamboo--Translating MPI applications to a latency-tolerant, data-driven form

T Nguyen, P Cicotti, E Bylaska… - SC'12: Proceedings …, 2012 - ieeexplore.ieee.org
We present Bamboo, a custom source-to-source translator that transforms MPI C source into
a data-driven form that automatically overlaps communication with available computation …

Loop chaining: A programming abstraction for balancing locality and parallelism

CD Krieger, MM Strout, C Olschanowsky… - … on Parallel & …, 2013 - ieeexplore.ieee.org
There is a significant, established code base in the scientific computing community. Some of
these codes have been parallelized already but are now encountering scalability issues due …

Generalizing run-time tiling with the loop chain abstraction

MM Strout, F Luporini, CD Krieger… - 2014 IEEE 28th …, 2014 - ieeexplore.ieee.org
Many scientific applications are organized in a data parallel way: as sequences of parallel
and/or reduction loops. This exposes parallelism well, but does not convert data reuse …

Identifying and scheduling loop chains using directives

IJ Bertolacci, MM Strout, S Guzik, J Riley… - … Third Workshop on …, 2016 - ieeexplore.ieee.org
Exposing opportunities for parallelization while explicitly managing data locality is the
primary challenge to porting and optimizing existing computational science simulation codes …

Using the loop chain abstraction to schedule across loops in existing code

IJ Bertolacci, MM Strout, J Riley… - … Journal of High …, 2019 - inderscienceonline.com
Exposing opportunities for parallelisation while explicitly managing data locality is the
primary challenge to porting and optimising computational science simulation codes to …

Parallel genetic algorithm for VLSI standard cell placement

P Subbaraj, SS Sankar, S Anand - … International Conference on …, 2009 - ieeexplore.ieee.org
This work addresses the methods to solve VLSI standard cell placement problem with the
objectives of minimizing the wire length and computational time. In this work a parallel GA …

Perilla: Metadata-based optimizations of an asynchronous runtime for adaptive mesh refinement

T Nguyen, D Unat, W Zhang, A Almgren… - SC'16: Proceedings …, 2016 - ieeexplore.ieee.org
Hardware architecture is increasingly complex, urging the development of asynchronous
runtime systems with advance resource and locality management supports. However, these …

Automatic translation of MPI source into a latency-tolerant, data-driven form

T Nguyen, P Cicotti, E Bylaska, D Quinlan… - Journal of Parallel and …, 2017 - Elsevier
Hiding communication behind useful computation is an important performance programming
technique but remains an inscrutable programming exercise even for the expert. We present …