A study on communication issues for systems-on-chip

CA Zeferino, ME Kreutz, L Carro… - … . 15th Symposium on …, 2002 - ieeexplore.ieee.org
Present days cores composing a system-on-chip might be interconnected by means of both
dedicated channels or shared buses. Nevertheless, future systems will have strong …

Twister2: Design of a big data toolkit

S Kamburugamuve, K Govindarajan… - Concurrency and …, 2020 - Wiley Online Library
Data‐driven applications are essential to handle the ever‐increasing volume, velocity, and
veracity of data generated by sources such as the Web and Internet of Things (IoT) devices …

Compiling affine loop nests for a dynamic scheduling runtime on shared and distributed memory

R Dathathri, RT Mullapudi, U Bondhugula - ACM Transactions on …, 2016 - dl.acm.org
Current de-facto parallel programming models like OpenMP and MPI make it difficult to
extract task-level dataflow parallelism as opposed to bulk-synchronous parallelism. Task …

Automated software transplantation

A Marginean - 2021 - discovery.ucl.ac.uk
Automated program repair has excited researchers for more than a decade, yet it has yet to
find full scale deployment in industry. We report our experience with SAPFIX: the first …

Benchmarking polystores: the CloudMdsQL experience

B Kolev, R Pau, O Levchenko… - … Conference on Big …, 2016 - ieeexplore.ieee.org
The CloudMdsQL polystore provides integrated access to multiple heterogeneous data
stores, such as RDBMS, NoSQL or even HDFS through a big data analytics framework such …

Data movement in the Internet of Things domain

F D'andria, D Field, A Kopaneli, G Kousiouris… - Service Oriented and …, 2015 - Springer
Managing data produced in the Internet of Things according to the traditional data-center
based approach is becoming no longer appropriate. Devices are improving their …

Compiler-assisted overlapping of communication and computation in MPI applications

J Guo, Q Yi, J Meng, J Zhang… - 2016 IEEE International …, 2016 - ieeexplore.ieee.org
The performance of distributed-memory applications, many of which are written in MPI,
critically depends on how well the applications can ameliorate the long latency of data …

Task-based algorithm for matrix multiplication: A step towards block-sparse tensor computing

JA Calvin, EF Valeev - arXiv preprint arXiv:1504.05046, 2015 - arxiv.org
Distributed-memory matrix multiplication (MM) is a key element of algorithms in many
domains (machine learning, quantum physics). Conventional algorithms for dense MM rely …

Load balancing in large scale bayesian inference

D Wälchli, SM Martin, A Economides… - Proceedings of the …, 2020 - dl.acm.org
We present a novel strategy to improve load balancing for large scale Bayesian inference
problems. Load imbalance can be particularly destructive in generation based uncertainty …

Transforming blocking MPI collectives to non-blocking and persistent operations

H Ahmed, A Skjellumh, P Bangalore… - Proceedings of the 24th …, 2017 - dl.acm.org
This paper describes Petal, a prototype tool that uses compiler-analysis techniques to
automate code transformations to hide communication costs behind computation by …