Optimization of sparse matrix-vector multiplication on emerging multicore platforms S Williams, L Oliker, R Vuduc, J Shalf, K Yelick, J Demmel Proceedings of the 2007 ACM/IEEE Conference on Supercomputing, 1-12, 2007 | 1056 | 2007 |
OSKI: A library of automatically tuned sparse matrix kernels R Vuduc, JW Demmel, KA Yelick Journal of Physics: Conference Series 16 (1), 521, 2005 | 724 | 2005 |
Model-driven autotuning of sparse matrix-vector multiply on GPUs JW Choi, A Singh, RW Vuduc ACM sigplan notices 45 (5), 115-126, 2010 | 566 | 2010 |
Sparsity: Optimization framework for sparse matrix kernels EJ Im, K Yelick, R Vuduc The International Journal of High Performance Computing Applications 18 (1 …, 2004 | 427 | 2004 |
Automatic performance tuning of sparse matrix kernels RW Vuduc University of California, Berkeley, 2003 | 357 | 2003 |
Self-adapting linear algebra algorithms and software J Demmel, J Dongarra, V Eijkhout, E Fuentes, A Petitet, R Vuduc, ... Proceedings of the IEEE 93 (2), 293-312, 2005 | 275 | 2005 |
A performance analysis framework for identifying potential benefits in GPGPU applications J Sim, A Dasgupta, H Kim, R Vuduc Proceedings of the 17th ACM SIGPLAN symposium on Principles and Practice of …, 2012 | 269 | 2012 |
A massively parallel adaptive fast-multipole method on heterogeneous architectures I Lashuk, A Chandramowlishwaran, H Langston, TA Nguyen, R Sampath, ... Proceedings of the Conference on High Performance Computing Networking …, 2009 | 251 | 2009 |
Fast sparse matrix-vector multiplication by exploiting variable block structure R Vuduc, HJ Moon High Performance Computing and Communications, 807-816, 2005 | 222 | 2005 |
Petascale direct numerical simulation of blood flow on 200k cores and heterogeneous architectures A Rahimian, I Lashuk, S Veerapaneni, A Chandramowlishwaran, ... SC'10: Proceedings of the 2010 ACM/IEEE International Conference for High …, 2010 | 218 | 2010 |
Falcon: fault localization in concurrent programs S Park, RW Vuduc, MJ Harrold Proceedings of the 32nd ACM/IEEE International Conference on Software …, 2010 | 206 | 2010 |
Performance optimizations and bounds for sparse matrix-vector multiply R Vuduc, JW Demmel, KA Yelick, S Kamil, R Nishtala, B Lee SC'02: Proceedings of the 2002 ACM/IEEE Conference on Supercomputing, 26-26, 2002 | 203 | 2002 |
A roofline model of energy JW Choi, D Bedard, R Fowler, R Vuduc 2013 IEEE 27th International Symposium on Parallel and Distributed …, 2013 | 196 | 2013 |
Many-thread aware prefetching mechanisms for GPGPU applications J Lee, NB Lakshminarayana, H Kim, R Vuduc 2010 43rd Annual IEEE/ACM International Symposium on Microarchitecture, 213-224, 2010 | 190 | 2010 |
When prefetching works, when it doesn’t, and why J Lee, H Kim, R Vuduc ACM Transactions on Architecture and Code Optimization (TACO) 9 (1), 1-29, 2012 | 189 | 2012 |
On the limits of GPU acceleration R Vuduc, A Chandramowlishwaran, J Choi, M Guney, A Shringarpure Proceedings of the 2nd USENIX conference on Hot topics in parallelism 13 (0), 2010 | 178 | 2010 |
FROSTT: The formidable repository of open sparse tensors and tools S Smith, JW Choi, J Li, R Vuduc, J Park, X Liu, G Karypis | 158 | 2017 |
POET: Parameterized optimizations for empirical tuning Q Yi, K Seymour, H You, R Vuduc, D Quinlan 2007 IEEE International Parallel and Distributed Processing Symposium, 1-8, 2007 | 158 | 2007 |
Statistical models for empirical search-based performance tuning R Vuduc, JW Demmel, JA Bilmes International Journal of High Performance Computing Applications 18 (1), 65-94, 2004 | 154 | 2004 |
When cache blocking of sparse matrix vector multiply works and why R Nishtala, RW Vuduc, JW Demmel, KA Yelick Applicable Algebra in Engineering, Communication and Computing 18, 297-311, 2007 | 149 | 2007 |