gProximity: hierarchical GPU‐based operations for collision and distance queries

C Lauterbach, Q Mo, D Manocha - Computer Graphics Forum, 2010 - Wiley Online Library
We present novel parallel algorithms for collision detection and separation distance
computation for rigid and deformable models that exploit the computational capabilities of …

StreamScan: fast scan algorithms for GPUs without global barrier synchronization

S Yan, G Long, Y Zhang - Proceedings of the 18th ACM SIGPLAN …, 2013 - dl.acm.org
Scan (also known as prefix sum) is a very useful primitive for various important parallel
algorithms, such as sort, BFS, SpMV, compaction and so on. Current state of the art of GPU …

Efficient GPU spatial-temporal multitasking

Y Liang, HP Huynh, K Rupnow… - IEEE Transactions on …, 2014 - ieeexplore.ieee.org
Heterogeneous computing nodes are now pervasive throughout computing, and GPUs have
emerged as a leading computing device for application acceleration. GPUs have …

ScatterAlloc: Massively parallel dynamic memory allocation for the GPU

M Steinberger, M Kenzel, B Kainz… - 2012 Innovative …, 2012 - ieeexplore.ieee.org
In this paper, we analyze the special requirements of a dynamic memory allocator that is
designed for massively parallel architectures such as Graphics Processing Units (GPUs) …

Processing data streams with hard real-time constraints on heterogeneous systems

U Verner, A Schuster, M Silberstein - Proceedings of the international …, 2011 - dl.acm.org
Data stream processing applications such as stock exchange data analysis, VoIP streaming,
and sensor data processing pose two conflicting challenges: short per-stream latency--to …

Scheduling processing of real-time data streams on heterogeneous multi-GPU systems

U Verner, A Schuster, M Silberstein… - Proceedings of the 5th …, 2012 - dl.acm.org
Processing vast numbers of data streams is a common problem in modern computer
systems and is known as the" online big data problem." Adding hard real-time constraints to …

Dynamic task parallelism with a GPU work-stealing runtime system

S Chatterjee, M Grossman, A Sbîrlea… - Languages and Compilers …, 2013 - Springer
Abstract NVIDIA's Compute Unified Device Architecture (CUDA) enabled GPUs become
accessible to mainstream programming. Abundance of simple computational cores and high …

Fast interactive simulations of cardiac electrical activity in anatomically accurate heart structures by compressing sparse uniform cartesian grids

A Kaboudian, RA Gray, I Uzelac, EM Cherry… - Computer Methods and …, 2024 - Elsevier
Abstract Background and Objective: Numerical simulations are valuable tools for studying
cardiac arrhythmias. Not only do they complement experimental studies, but there is also an …

Sorting with gpus: A survey

DI Arkhipov, D Wu, K Li, AC Regan - arXiv preprint arXiv:1709.02520, 2017 - arxiv.org
Sorting is a fundamental operation in computer science and is a bottleneck in many
important fields. Sorting is critical to database applications, online search and indexing …

多核软件的几个关键问题及其研究进展

杨际祥, 谭国真, 王荣生 - 电子学报, 2010 - ejournal.org.cn
提高应用程序开发产能同时获得并行性能收益是多核大众化并行计算研究的核心目标.
采用应用驱动和自顶向下的研究思想着重综述了影响该目标的三个关键问题. 首先 …