Realm: An event-based low-level runtime for distributed memory architectures

S Treichler, M Bauer, A Aiken - … of the 23rd international conference on …, 2014 - dl.acm.org
We present Realm, an event-based runtime system for heterogeneous, distributed memory
machines. Realm is fully asynchronous: all runtime actions are non-blocking. Realm …

Bamboo--Translating MPI applications to a latency-tolerant, data-driven form

T Nguyen, P Cicotti, E Bylaska… - SC'12: Proceedings …, 2012 - ieeexplore.ieee.org
We present Bamboo, a custom source-to-source translator that transforms MPI C source into
a data-driven form that automatically overlaps communication with available computation …

Latency hiding and performance tuning with graph-based execution

P Cicotti, SB Baden - 2011 First Workshop on Data-Flow …, 2011 - ieeexplore.ieee.org
In the current practice, scientific programmer and HPC users are required to develop code
that exposes a high degree of parallelism, exhibits high locality, dynamically adapts to the …

Automatic translation of MPI source into a latency-tolerant, data-driven form

T Nguyen, P Cicotti, E Bylaska, D Quinlan… - Journal of Parallel and …, 2017 - Elsevier
Hiding communication behind useful computation is an important performance programming
technique but remains an inscrutable programming exercise even for the expert. We present …

A framework for integrating mobility into collaborative business processes

I Hawryszkiewycz, R Steele - International Conference on …, 2005 - ieeexplore.ieee.org
Most collaborative systems assume that users have access to workspaces through which
they can access the entire collaborative context. This is often not the case with mobile users …

Hiding communication latency with non-spmd, graph-based execution

J Sorensen, SB Baden - … –ICCS 2009: 9th International Conference Baton …, 2009 - Springer
Reformulating an algorithm to mask communication delays is crucial in maintaining
scalability, but traditional solutions embed the overlap strategy into the application. We …

Towards Intelligent Runtime Framework for Distributed Heterogeneous Systems

P Thomadakis - 2023 - search.proquest.com
Scientific applications strive for increased memory and computing performance, requiring
massive amounts of data and time to produce results. Applications utilize large-scale …

MATE, a Unified Model for Communication-Tolerant Scientific Applications

SM Martin, SB Baden - … Workshop on Languages and Compilers for …, 2018 - Springer
We present MATE, a model for developing communication-tolerant scientific applications.
MATE employs a combination of mechanisms to reduce or hide the cost of network and intra …

[PDF][PDF] Preliminary scaling results on multiple hybrid nodes of Knights Corner and Sandy Bridge processors

T Nguyen, SB Baden - … on Domain-Specific Languages and High …, 2013 - bamboo.ucsd.edu
We discuss our experience in optimizing a stencil method on an Intel Xeon Phi-based
cluster. We describe our solutions to three challenges: tolerating the high cost of inter-node …

Study on feature extraction method based on parallel coordinate plots

C Jianxin, H Wenxue, G Haibo - 2008 International Conference …, 2008 - ieeexplore.ieee.org
A novel feature extraction method based on parallel coordinate plots was presented.
Observing the parallel coordinate plots, discovered that using the distance of one point to …