[图书][B] Parallel Computers 2: architecture, programming and algorithms

RW Hockney, CR Jesshope - 2019 - taylorfrancis.com
… , based on multiple instruction streams time-sharing a single … code, a 24-bit main core address
and two 7-bit index-register … of execution, and could be said to be in parallel execution. A …

Regmutex: Inter-warp gpu register time-sharing

F Khorasani, HA Esfeden… - … Architecture (ISCA), 2018 - ieeexplore.ieee.org
… RegMutex’s compiler and micro-architectural design. Section IV … registers during the program
execution for a sample thread … sets can execute in parallel, serializing only the portions that …

Portable and transparent software managed scheduling on accelerators for fair resource sharing

C Margiolas, MFP O'Boyle - … the 2016 international symposium on code …, 2016 - dl.acm.org
… Just In Time compiler that enables resource sharing control and … sharing: thread number,
local memory usage and register … 2, 4 or 8 parallel kernel execution requests. We first evaluate …

Duality cache for data parallel acceleration

D Fujiki, S Mahlke, R Das - … Symposium on Computer Architecture, 2019 - dl.acm.org
Modern general purpose processors and accelerators are in… register resources shared
by many execution units, our … We modify the source code of Omni Compiler to disable the …

Warp-consolidation: A novel execution model for gpus

A Li, W Liu, L Wang, K Barker, SL Song - Proceedings of the 2018 …, 2018 - dl.acm.org
parallel execution that workload is unbalanced due to the disparity arising from various aspects,
eg, application code, input data, shared … warps via shared memory, we propose register

Alpaka--an abstraction library for parallel kernel acceleration

E Zenker, B Worpitz, R Widera, A Huebl… - … International Parallel …, 2016 - ieeexplore.ieee.org
… a parallel programming model to serve code maintainability … targeting the portable parallel
task execution within nodes. … within register memory and are not shared between threads. …

A compiler infrastructure for accelerator generators

R Nigam, S Thomas, Z Li, A Sampson - … Conference on Architectural …, 2021 - dl.acm.org
… This section introduces Calyx by using it to implement a parallel … Given the execution
schedule of our Calyx program, it is clear … To enable register sharing, we implement a live-range …

Plasticine: A reconfigurable architecture for parallel paterns

R Prabhakar, Y Zhang, D Koeplinger… - … Computer Architecture …, 2017 - dl.acm.org
… to simplify parallel programming and code generation for a … tor architecture optimized for
efficient execution of parallel … elements with register files, and a shared multi-ported register file. …

Using meta-heuristics and machine learning for software optimization of parallel computing systems: a systematic literature review

S Memeti, S Pllana, A Binotto, J Kołodziej, I Brandic - Computing, 2019 - Springer
Modern parallel computing architectures are complex due to … architectural specifications
such as cache and register … the program execution time is difficult to achieve in shared

Software compilation techniques for heterogeneous embedded multi-core systems

R Leupers, MA Aguilar, J Castrillon… - Handbook of Signal …, 2019 - Springer
… single-core compiler requires architecture information, such as … Typical backend steps
include code selection, register … standard parallel programming model for shared memory …