Minimal multi-threading: Finding and removing redundant instructions in multi-threaded processors

N Brunie, C Collange, G Diamos - ACM SIGARCH Computer Architecture …, 2012 - dl.acm.org

Single-Instruction Multiple-Thread (SIMT) micro-architectures implemented in Graphics
Processing Units (GPUs) run fine-grained threads in lockstep by grouping them into units …

被引用次数：140 相关文章所有 18 个版本

R2D2: Removing ReDunDancy Utilizing Linearity of Address Generation in GPUs

D Ha, Y Oh, WW Ro - Proceedings of the 50th Annual International …, 2023 - dl.acm.org

A generally used GPU programming methodology is that adjacent threads access data in
neighbor or specific-stride memory addresses and perform computations with the fetched …

被引用次数：3 相关文章所有 2 个版本

[PDF] psu.edu

Rhythm: Harnessing data parallel hardware for server workloads

SR Agrawal, V Pistol, J Pang, J Tran, D Tarjan… - ACM SIGPLAN …, 2014 - dl.acm.org

Trends in increasing web traffic demand an increase in server throughput while preserving
energy efficiency and total cost of ownership. Present work in optimizing data center …

被引用次数：53 相关文章所有 9 个版本

[PDF] hal.science

Compile-time function memoization

A Suresh, E Rohou, A Seznec - … of the 26th International Conference on …, 2017 - dl.acm.org

Memoization is the technique of saving the results of computations so that future executions
can be omitted when the same inputs repeat. Recent work showed that memoization can be …

被引用次数：36 相关文章所有 5 个版本

[PDF] googleapis.com

Temporal SIMT execution optimization through elimination of redundant operations

RM Krashinsky - US Patent 9,830,156, 2017 - Google Patents

One embodiment of the present invention sets forth a technique for optimizing parallel
thread execution in a temporal single-instruction multiple thread (SIMT) architecture. When …

被引用次数：51 相关文章所有 4 个版本

[PDF] academia.edu

Exploiting uniform vector instructions for GPGPU performance, energy efficiency, and opportunistic reliability enhancement

P Xiang, Y Yang, M Mantor, N Rubin, LR Hsu… - Proceedings of the 27th …, 2013 - dl.acm.org

State-of-art graphics processing units (GPUs) employ the single-instruction multiple-data
(SIMD) style execution to achieve both high computational throughput and energy efficiency …

被引用次数：48 相关文章所有 6 个版本

[PDF] iastate.edu

Microarchitectural mechanisms to exploit value structure in SIMT architectures

J Kim, C Torng, S Srinath, D Lockhart… - Proceedings of the 40th …, 2013 - dl.acm.org

SIMT architectures improve performance and efficiency by exploiting control and memory-
access structure across data-parallel threads. Value structure occurs when multiple threads …

被引用次数：45 相关文章所有 11 个版本

[PDF] hal.science

Dynamic inter-thread vectorization architecture: extracting DLP from TLP

S Kalathingal, C Collange, BN Swamy… - … Architecture and High …, 2016 - ieeexplore.ieee.org

Threads of Single-Program Multiple-Data (SPMD) applications often execute the same
instructions on different data. We propose the Dynamic Inter-Thread Vectorization …

被引用次数：32 相关文章所有 8 个版本

[PDF] nsf.gov

Big. VLITTLE: On-demand data-parallel acceleration for mobile systems on chip

T Ta, K Al-Hawaj, N Cebry, Y Ou, E Hall… - 2022 55th IEEE/ACM …, 2022 - ieeexplore.ieee.org

Single-ISA heterogeneous multi-core architectures offer a compelling high-performance and
high-efficiency solution to executing task-parallel workloads in mobile systems on chip …

被引用次数：3 相关文章所有 6 个版本

[PDF] acm.org Full View

SIMT-X: Extending single-instruction multi-threading to out-of-order cores

A Tino, C Collange, A Seznec - ACM Transactions on Architecture and …, 2020 - dl.acm.org

This work introduces Single Instruction Multi-Thread Express (SIMT-X), a general-purpose
Central Processing Unit (CPU) microarchitecture that enables Graphics Processing Units …

被引用次数：11 相关文章所有 10 个版本

高级搜索

QQ 群