Swift/t: Large-scale application composition via distributed-memory dataflow processing

JM Wozniak, TG Armstrong, M Wilde… - 2013 13th IEEE/ACM …, 2013 - ieeexplore.ieee.org
Many scientific applications are conceptually built up from independent component tasks as
a parameter study, optimization, or other search. Large batches of these tasks may be …

Recomputing coverage information to assist regression testing

PK Chittimalli, MJ Harrold - IEEE Transactions on Software …, 2009 - ieeexplore.ieee.org
This paper presents a technique that leverages an existing regression test selection
algorithm to compute accurate, updated coverage data on a version of the software, P i+ 1 …

A study on communication issues for systems-on-chip

CA Zeferino, ME Kreutz, L Carro… - … . 15th Symposium on …, 2002 - ieeexplore.ieee.org
Present days cores composing a system-on-chip might be interconnected by means of both
dedicated channels or shared buses. Nevertheless, future systems will have strong …

Swift/T: Scalable data flow programming for many-task applications

JM Wozniak, TG Armstrong, M Wilde, DS Katz… - Proceedings of the 18th …, 2013 - dl.acm.org
Swift/T: Scalable Data Flow Programming for Many-Task Applications Page 1 Swift/T: Scalable
Data Flow Programming for Many-Task Applications Justin M. Wozniak Argonne National …

Bamboo--Translating MPI applications to a latency-tolerant, data-driven form

T Nguyen, P Cicotti, E Bylaska… - SC'12: Proceedings …, 2012 - ieeexplore.ieee.org
We present Bamboo, a custom source-to-source translator that transforms MPI C source into
a data-driven form that automatically overlaps communication with available computation …

[图书][B] Programming models for parallel computing

P Balaji - 2015 - books.google.com
An overview of the most prominent contemporary parallel processing programming models,
written in a unique tutorial style. With the coming of the parallel computing era, computer …

Data movement in data-intensive high performance computing

P Cicotti, S Oral, G Kestor, R Gioiosa, S Strande… - Conquering Big Data …, 2016 - Springer
The cost of executing a floating point operation has been decreasing for decades at a much
higher rate than that of moving data. Bandwidth and latency, two key metrics that determine …

Perilla: Metadata-based optimizations of an asynchronous runtime for adaptive mesh refinement

T Nguyen, D Unat, W Zhang, A Almgren… - SC'16: Proceedings …, 2016 - ieeexplore.ieee.org
Hardware architecture is increasingly complex, urging the development of asynchronous
runtime systems with advance resource and locality management supports. However, these …

Automatic translation of MPI source into a latency-tolerant, data-driven form

T Nguyen, P Cicotti, E Bylaska, D Quinlan… - Journal of Parallel and …, 2017 - Elsevier
Hiding communication behind useful computation is an important performance programming
technique but remains an inscrutable programming exercise even for the expert. We present …

POSTER: Utilizing dataflow-based execution for coupled cluster methods

H McCraw, A Danalis, T Herault… - 2014 IEEE …, 2014 - ieeexplore.ieee.org
Computational chemistry comprises one of the driving forces of High Performance
Computing. In particular, many-body methods, such as Coupled Cluster methods (CC)[1] of …