High performance and scalable radix sorting: A case study of implementing dynamic parallelism for GPU computing

D Merrill, A Grimshaw - Parallel Processing Letters, 2011 - World Scientific
The need to rank and order data is pervasive, and many algorithms are fundamentally
dependent upon sorting and partitioning operations. Prior to this work, GPU stream …

Revisiting sorting for GPGPU stream architectures

DG Merrill, AS Grimshaw - … of the 19th international conference on …, 2010 - dl.acm.org
This poster presents efficient strategies for sorting large sequences of fixed-length keys (and
values) using GPGPU stream processors. Compared to the state-of-the-art, our radix sorting …

Divergence analysis and optimizations

B Coutinho, D Sampaio, FMQ Pereira… - 2011 International …, 2011 - ieeexplore.ieee.org
The growing interest in GPU programming has brought renewed attention to the Single
Instruction Multiple Data (SIMD) execution model. SIMD machines give application …

StreamScan: fast scan algorithms for GPUs without global barrier synchronization

S Yan, G Long, Y Zhang - Proceedings of the 18th ACM SIGPLAN …, 2013 - dl.acm.org
Scan (also known as prefix sum) is a very useful primitive for various important parallel
algorithms, such as sort, BFS, SpMV, compaction and so on. Current state of the art of GPU …

Evaluating multi-GPU sorting with modern interconnects

T Maltenberger, I Ilic, I Tolovski, T Rabl - Proceedings of the 2022 …, 2022 - dl.acm.org
GPUs have become a mainstream accelerator for database operations such as sorting. Most
GPU sorting algorithms are single-GPU approaches. They neither harness the full …

Survey of GPU based sorting algorithms

DP Singh, I Joshi, J Choudhary - International Journal of Parallel …, 2018 - Springer
Parallel sorting algorithms are widely studied nowadays. After the introduction of parallel
processors such as graphics processing unit (GPU) and easy to use parallel programming …

[图书][B] Efficient parallel merge sort for fixed and variable length keys

A Davidson, D Tarjan, M Garland, JD Owens - 2012 - ieeexplore.ieee.org
We design a high-performance parallel merge sort for highly parallel systems. Our merge
sort is designed to use more register communication (not shared memory), and does not …

Sorting in memristive memory

MR Alam, MH Najafi, N TaheriNejad - ACM Journal on Emerging …, 2022 - dl.acm.org
Sorting data is needed in many application domains. Traditionally, the data is read from
memory and sent to a general-purpose processor or application-specific hardware for …

Fast k-selection algorithms for graphics processing units

T Alabi, JD Blanchard, B Gordon… - Journal of Experimental …, 2012 - dl.acm.org
Finding the k th-largest value in a list of n values is a well-studied problem for which many
algorithms have been proposed. A naïve approach is to sort the list and then simply select …

An Efficient O( ) Comparison-Free Sorting Algorithm

S Abdel-Hafeez, A Gordon-Ross - IEEE Transactions on Very …, 2017 - ieeexplore.ieee.org
In this paper, we propose a novel sorting algorithm that sorts input data integer elements on-
the-fly without any comparison operations between the data-comparison-free sorting. We …