Analyzing CUDA Workloads Using a Detailed GPU Simulator A Bakhoda, GL Yuan, WWL Fung, H Wong, TM Aamodt Performance Analysis of Systems and Software, 2009. ISPASS 2009. IEEE …, 2009 | 2066 | 2009 |
Cnvlutin: Ineffectual-neuron-free deep neural network computing J Albericio, P Judd, T Hetherington, T Aamodt, NE Jerger, A Moshovos ACM SIGARCH Computer Architecture News 44 (3), 1-13, 2016 | 889 | 2016 |
GPUWattch: Enabling Energy Optimizations in GPGPUs J Leng, T Hetherington, A ElTantawy, S Gilani, NS Kim, TM Aamodt, ... Proceedings of the 40th Annual International Symposium on Computer …, 2013 | 759 | 2013 |
Dynamic Warp Formation and Scheduling for Efficient GPU Control Flow WWL Fung, I Sham, G Yuan, TM Aamodt Proceedings of the 40th Annual IEEE/ACM International Symposium on …, 2007 | 640 | 2007 |
Stripes: Bit-serial deep neural network computing P Judd, J Albericio, T Hetherington, TM Aamodt, A Moshovos 2016 49th Annual IEEE/ACM International Symposium on Microarchitecture …, 2016 | 548 | 2016 |
Cache-Conscious Wavefront Scheduling TG Rogers, M O'Connor, TM Aamodt Proceedings of the 2012 45th Annual IEEE/ACM International Symposium on …, 2012 | 545 | 2012 |
Thread Block Compaction for Efficient SIMT Control Flow WWL Fung, TM Aamodt High Performance Computer Architecture (HPCA), 2011 IEEE 17th International …, 2011 | 270 | 2011 |
Accel-Sim: An Extensible Simulation Framework for Validated GPU Modeling M Khairy, Z Shen, TM Aamodt, TG Rogers 2020 ACM/IEEE 47th Annual International Symposium on Computer Architecture …, 2020 | 219 | 2020 |
Cache Coherence for GPU Architectures. I Singh, A Shriraman, WWL Fung, M O'Connor, TM Aamodt High Performance Computer Architecture (HPCA), 2013 IEEE 19th International …, 2013 | 210 | 2013 |
Throughput-effective on-chip networks for manycore accelerators A Bakhoda, J Kim, TM Aamodt 2010 43rd Annual IEEE/ACM International Symposium on Microarchitecture, 421-432, 2010 | 207 | 2010 |
Divergence-Aware Warp Scheduling TG Rogers, M O'Connor, TM Aamodt Proceedings of the 46th Annual IEEE/ACM International Symposium on …, 2013 | 192 | 2013 |
Complexity effective memory access scheduling for many-core accelerator architectures GL Yuan, A Bakhoda, TM Aamodt Proceedings of the 42nd Annual IEEE/ACM International Symposium on …, 2009 | 160 | 2009 |
Characterizing and evaluating a key-value store application on heterogeneous CPU-GPU systems TH Hetherington, TG Rogers, L Hsu, M O'Connor, TM Aamodt 2012 IEEE International Symposium on Performance Analysis of Systems …, 2012 | 149 | 2012 |
Hardware Transactional Memory for GPU Architectures WWL Fung, I Singh, A Brownsword, TM Aamodt Proceedings of the 44th Annual IEEE/ACM International Symposium on …, 2011 | 133 | 2011 |
Reduced-Precision Strategies for Bounded Memory in Deep Neural Nets P Judd, J Albericio, T Hetherington, T Aamodt, NE Jerger, R Urtasun, ... arXiv preprint arXiv:1511.05236, 2015 | 132 | 2015 |
Proteus: Exploiting numerical precision variability in deep neural networks P Judd, J Albericio, T Hetherington, TM Aamodt, NE Jerger, A Moshovos Proceedings of the 2016 International Conference on Supercomputing, 1-12, 2016 | 125 | 2016 |
Speculative multi-threading for instruction prefetch and/or trace pre-build H Wang, TM Aamodt, P Marcuello, JW Stark IV, JP Shen, A González, ... US Patent 7,814,469, 2010 | 121 | 2010 |
Speculative multi-threading for instruction prefetch and/or trace pre-build H Wang, TM Aamodt, P Marcuello, JW Stark IV, JP Shen, A González, ... US Patent 7,814,469, 2010 | 121 | 2010 |
A first-order fine-grained multithreaded throughput model XE Chen, TM Aamodt 2009 IEEE 15th International Symposium on High Performance Computer …, 2009 | 116 | 2009 |
Dynamic warp formation: Efficient MIMD control flow on SIMD graphics hardware WWL Fung, I Sham, G Yuan, TM Aamodt ACM Transactions on Architecture and Code Optimization (TACO) 6 (2), 1-37, 2009 | 111 | 2009 |