A programming model for GPU load balancing

M Osama, SD Porumbescu, JD Owens - Proceedings of the 28th ACM …, 2023 - dl.acm.org
We propose a GPU fine-grained load-balancing abstraction that decouples load balancing
from work processing and aims to support both static and dynamic schedules with a …

Improving cryptanalytic applications with stochastic runtimes on GPUs and multicores

L Oden, J Keller - Parallel Computing, 2022 - Elsevier
We investigate cryptanalytic applications comprised of many independent tasks that exhibit
a stochastic runtime distribution. We compare four algorithms for executing such …

Extracting SIMD parallelism from recursive task-parallel programs

B Ren, S Balakrishna, Y Jo, S Krishnamoorthy… - ACM Transactions on …, 2019 - dl.acm.org
The pursuit of computational efficiency has led to the proliferation of throughput-oriented
hardware, from GPUs to increasingly wide vector units on commodity processors and …

High-performance tensor decoder on GPUs for wireless camera networks in IoT

H Li, T Zhang, R Zhang, XY Liu - 2019 IEEE 21st International …, 2019 - ieeexplore.ieee.org
With the rapid development of the Internet of Things, tensor-based coding and decoding
algorithms are widely used in wireless camera networks. Recently, a novel video decoder …

Improving cryptanalytic applications with stochastic runtimes on GPUs

L Oden, J Keller - 2021 IEEE International Parallel and …, 2021 - ieeexplore.ieee.org
We investigate cryptanalytic applications comprised of many independent tasks that exhibit
a stochastic runtime distribution. We compare four algorithms for executing such …

动态任务分配CUDA 线程束步进体绘制

孙万捷, 高瞻, 潘海燕, 王杰华, 蒋峥峥 - 计算机辅助设计与图形学学报, 2016 - jcad.cn
针对标准CUDA 光线投射体绘制过程中因线程束内线程计算量不均产生线程束分化,
导致计算资源利用率低的问题, 提出CUDA 线程束步进的算法. 首先分析标准CUDA …

High-performance homomorphic matrix completion on GPUs

H Lu, T Zhang, XY Liu - … Conference on Smart City; IEEE 5th …, 2019 - ieeexplore.ieee.org
Data loss and privacy exposure are two major issues in many applications such as trajectory
tracking and online recommendations. The homomorphic matrix completion (HMC) is a …

Efficient incremental pagerank of evolving graphs on GPU

T Zhang - 2017 International Conference on Computer Systems …, 2017 - ieeexplore.ieee.org
Real-world graphs such as social graphs are evolving over time. When a graph changes, its
historical result becomes invalid and needs re-computing. However, performing re …

[PDF][PDF] 基于动态任务调度的层次包围盒构建算法

张正昌, 何发智, 周毅 - 计算机辅助设计与图形学学报, 2018 - jcad.cn
交点计算是光线跟踪算法中开销最大的部分, 层次包围盒(BVH) 则是主流加速结构.
为了提高BVH 的构建速度, 提出一种基于动态任务调度和warp 线程优化的BVH 构建算法 …

Efficient graph mining on heterogeneous platforms in the cloud

T Zhang, W Tong, W Shen, J Peng, Z Niu - … 25–26, and December 15–16 …, 2018 - Springer
Abstract In this Big Data era, many large-scale and complex graphs have been produced
with the rapid growth of novel Internet applications and the new experiment data collecting …