An evaluation of global address space languages: co-array fortran and unified parallel c

X Wei, J Shi, Y Chen, R Chen, H Chen - Proceedings of the 25th …, 2015 - dl.acm.org

We present DrTM, a fast in-memory transaction processing system that exploits advanced
hardware features (ie, RDMA and HTM) to improve latency and throughput by over one …

被引用次数：357 相关文章所有 16 个版本

[PDF] psu.edu

A survey of parallel programming models and tools in the multi and many-core era

J Diaz, C Munoz-Caro, A Nino - IEEE Transactions on parallel …, 2012 - ieeexplore.ieee.org

In this work, we present a survey of the different parallel programming models and tools
available today with special consideration to their suitability for high-performance …

被引用次数：445 相关文章所有 6 个版本

[PDF] acm.org

Scale-out NUMA

S Novakovic, A Daglis, E Bugnion, B Falsafi… - ACM SIGPLAN …, 2014 - dl.acm.org

Emerging datacenter applications operate on vast datasets that are kept in DRAM to
minimize latency. The large number of servers needed to accommodate this massive …

被引用次数：229 相关文章所有 22 个版本

[PDF] osti.gov

Exploring traditional and emerging parallel programming models using a proxy application

I Karlin, A Bhatele, J Keasler… - 2013 IEEE 27th …, 2013 - ieeexplore.ieee.org

Parallel machines are becoming more complex with increasing core counts and more
heterogeneous architectures. However, the commonly used parallel programming models …

被引用次数：266 相关文章所有 16 个版本

[PDF] vldb.org

Efficient distributed memory management with RDMA and caching

Q Cai, W Guo, H Zhang, D Agrawal, G Chen… - Proceedings of the …, 2018 - dl.acm.org

Recent advancements in high-performance networking interconnect significantly narrow the
performance gap between intra-node and inter-node communications, and open up …

被引用次数：130 相关文章所有 6 个版本

An ephemeral burst-buffer file system for scientific applications

T Wang, K Mohror, A Moody, K Sato… - SC'16: Proceedings of …, 2016 - ieeexplore.ieee.org

Burst buffers are becoming an indispensable hardware resource on large-scale
supercomputers to buffer the bursty I/O from scientific applications. However, there is a lack …

被引用次数：149 相关文章所有 5 个版本

[PDF] upc.edu

A survey on the Distributed Computing stack

C Ramon-Cortes, P Alvarez, F Lordan, J Alvarez… - Computer Science …, 2021 - Elsevier

In this paper, we review the background and the state of the art of the Distributed Computing
software stack. We aim to provide the readers with a comprehensive overview of this area by …

被引用次数：15 相关文章所有 3 个版本

[PDF] wiley.com Full View

Performance comparison of OpenMP, MPI, and MapReduce in practical problems

SJ Kang, SY Lee, KM Lee - Advances in Multimedia, 2015 - Wiley Online Library

With problem size and complexity increasing, several parallel and distributed programming
models and frameworks have been developed to efficiently handle such problems. This …

被引用次数：95 相关文章所有 9 个版本

[PDF] acm.org

PARSECSs: Evaluating the impact of task parallelism in the PARSEC benchmark suite

D Chasapis, M Casas, M Moretó, R Vidal… - ACM Transactions on …, 2015 - dl.acm.org

In this work, we show how parallel applications can be implemented efficiently using task
parallelism. We also evaluate the benefits of such parallel paradigm with respect to other …

被引用次数：82 相关文章所有 6 个版本

[PDF] googleapis.com

Scale-out non-uniform memory access

S Novakovic, A Daglis, BR Grot, E Bugnion… - US Patent …, 2017 - Google Patents

(57) ABSTRACT A computing system that uses a Scale-Out NUMA (" SONUMA”)
architecture, programming model, and/or communication protocol provides for low-latency …

被引用次数：59 相关文章所有 4 个版本

高级搜索

QQ 群