作者
Hai Li, Tao Zhang, Runjia Zhang, Xiao-Yang Liu
发表日期
2019/8/10
研讨会论文
2019 IEEE 21st International Conference on High Performance Computing and Communications; IEEE 17th International Conference on Smart City; IEEE 5th International Conference on Data Science and Systems (HPCC/SmartCity/DSS)
页码范围
1619-1626
出版商
IEEE
简介
With the rapid development of the Internet of Things, tensor-based coding and decoding algorithms are widely used in wireless camera networks. Recently, a novel video decoder based on the low-tubal-rank tensor model has been proposed, which achieves better quality of services than conventional schemes. However, the tensor decoding algorithm is compute-intensive, rendering it impractical for real-time applications. In this paper, we propose effective strategies to accelerate the tensor decoder on GPUs (Graphics Processing Units). We implement the tensor decoding algorithm on the GPU architecture, and propose optimization strategies to eliminate data reorganizing overhead, provide batched complex matrix computations, and reduce memory consumption as well as computation overhead. With real video data, the GPU algorithm achieves an average of 237.44× and up to 312.39× speedups on a Tesla …
引用总数
20192020202120222023202412211
学术搜索中的文章
H Li, T Zhang, R Zhang, XY Liu - 2019 IEEE 21st International Conference on High …, 2019