[图书][B] Automatic performance tuning of sparse matrix kernels

RW Vuduc - 2003 - search.proquest.com
This dissertation presents an automated system to generate highly efficient, platform-
adapted implementations of sparse matrix kernels. We show that conventional …

Counter-based cache replacement and bypassing algorithms

M Kharbutli, Y Solihin - IEEE Transactions on Computers, 2008 - ieeexplore.ieee.org
Recent studies have shown that, in highly associative caches, the performance gap between
the least recently used (LRU) and the theoretical optimal replacement algorithms is large …

A case for richer cross-layer abstractions: Bridging the semantic gap with expressive memory

N Vijaykumar, A Jain, D Majumdar… - 2018 ACM/IEEE 45th …, 2018 - ieeexplore.ieee.org
This paper makes a case for a new cross-layer interface, Expressive Memory (XMem), to
communicate higher-level program semantics from the application to the system software …

Counter-based cache replacement algorithms

M Kharbutli, Y Solihin - 2005 International Conference on …, 2005 - ieeexplore.ieee.org
Recent studies have shown that in highly associative caches, the performance gap between
the least recently used (LRU) and the theoretical optimal replacement algorithms is large …

HAWS: Accelerating GPU wavefront execution through selective out-of-order execution

X Gong, X Gong, L Yu, D Kaeli - ACM Transactions on Architecture and …, 2019 - dl.acm.org
Graphics Processing Units (GPUs) have become an attractive platform for accelerating
challenging applications on a range of platforms, from High Performance Computing (HPC) …

Optimum code generation method and compiler device for multiprocessor

K Takayama, N Sukegawa - US Patent 8,296,746, 2012 - Google Patents
the present invention relates to an optimum code generation method and a compiler device
for a multiprocessor. More particularly, it relates to a method of generating codes for parallel …

[PDF][PDF] Read Optimized file system designs: A performance evaluation

MI Seltzer, M Stonebraker - ICDE, 1991 - Citeseer
This paper presents a performance comparison of several file system allocation policies.
The file systems are designed to provide high bandwidth between disks and main memory …

[PDF][PDF] Analytical computation of ehrhart polynomials and its applications for embedded systems

S Verdoolaege, K Beyls, M Bruynooghe, R Seghir… - CW …, 2004 - lirias.kuleuven.be
Many optimization techniques, including several targeted specifically at embedded systems,
depend on the ability to calculate the number of points in a parameterized polytope. It is well …

Compiler-directed resource management for active code regions

R Sree, A Settle, I Bratt… - Seventh Workshop on …, 2003 - ieeexplore.ieee.org
Recent studies on program execution behavior reveal that a large amount of execution time
is spent in small frequently executed regions of code. Whereas adaptive cache management …

Static cache hint generation based on a profile of the OPT cache replacement

T Xingyan, D Hongyan - 2010 International Conference on …, 2010 - ieeexplore.ieee.org
Caches have an increasing impact on overall performance because of the growing gap
between CPU cycle times and memory access times. Therefore, improving the cache …