Bamboo--Translating MPI applications to a latency-tolerant, data-driven form

CA Zeferino, ME Kreutz, L Carro… - … . 15th Symposium on …, 2002 - ieeexplore.ieee.org

Present days cores composing a system-on-chip might be interconnected by means of both
dedicated channels or shared buses. Nevertheless, future systems will have strong …

被引用次数：139 相关文章所有 6 个版本

[PDF] wiley.com

Twister2: Design of a big data toolkit

S Kamburugamuve, K Govindarajan… - Concurrency and …, 2020 - Wiley Online Library

Data‐driven applications are essential to handle the ever‐increasing volume, velocity, and
veracity of data generated by sources such as the Web and Internet of Things (IoT) devices …

被引用次数：31 相关文章所有 2 个版本

[PDF] iisc.ac.in

Compiling affine loop nests for a dynamic scheduling runtime on shared and distributed memory

R Dathathri, RT Mullapudi, U Bondhugula - ACM Transactions on …, 2016 - dl.acm.org

Current de-facto parallel programming models like OpenMP and MPI make it difficult to
extract task-level dataflow parallelism as opposed to bulk-synchronous parallelism. Task …

被引用次数：24 相关文章所有 3 个版本

[PDF] ucl.ac.uk

Automated software transplantation

A Marginean - 2021 - discovery.ucl.ac.uk

Automated program repair has excited researchers for more than a decade, yet it has yet to
find full scale deployment in industry. We report our experience with SAPFIX: the first …

被引用次数：8 相关文章所有 2 个版本

[PDF] inesctec.pt

Benchmarking polystores: the CloudMdsQL experience

B Kolev, R Pau, O Levchenko… - … Conference on Big …, 2016 - ieeexplore.ieee.org

The CloudMdsQL polystore provides integrated access to multiple heterogeneous data
stores, such as RDBMS, NoSQL or even HDFS through a big data analytics framework such …

被引用次数：16 相关文章所有 7 个版本

[PDF] hal.science

Data movement in the Internet of Things domain

F D'andria, D Field, A Kopaneli, G Kousiouris… - Service Oriented and …, 2015 - Springer

Managing data produced in the Internet of Things according to the traditional data-center
based approach is becoming no longer appropriate. Devices are improving their …

被引用次数：19 相关文章所有 13 个版本

[PDF] github.io

Compiler-assisted overlapping of communication and computation in MPI applications

J Guo, Q Yi, J Meng, J Zhang… - 2016 IEEE International …, 2016 - ieeexplore.ieee.org

The performance of distributed-memory applications, many of which are written in MPI,
critically depends on how well the applications can ameliorate the long latency of data …

被引用次数：15 相关文章所有 6 个版本

[PDF] arxiv.org

Task-based algorithm for matrix multiplication: A step towards block-sparse tensor computing

JA Calvin, EF Valeev - arXiv preprint arXiv:1504.05046, 2015 - arxiv.org

Distributed-memory matrix multiplication (MM) is a key element of algorithms in many
domains (machine learning, quantum physics). Conventional algorithms for dense MM rely …

被引用次数：14 相关文章所有 2 个版本

[PDF] google.com

Load balancing in large scale bayesian inference

D Wälchli, SM Martin, A Economides… - Proceedings of the …, 2020 - dl.acm.org

We present a novel strategy to improve load balancing for large scale Bayesian inference
problems. Load imbalance can be particularly destructive in generation based uncertainty …

被引用次数：6 相关文章所有 5 个版本

[PDF] acm.org

Transforming blocking MPI collectives to non-blocking and persistent operations

H Ahmed, A Skjellumh, P Bangalore… - Proceedings of the 24th …, 2017 - dl.acm.org

This paper describes Petal, a prototype tool that uses compiler-analysis techniques to
automate code transformations to hide communication costs behind computation by …

被引用次数：8 相关文章所有 3 个版本

高级搜索

QQ 群