Summarizer: trading communication with computing near storage G Koo*, KK Matam*, I Te, HVKG Narra, J Li, HW Tseng, S Swanson, ... 2017 50th Annual IEEE/ACM International Symposium on Microarchitecture …, 2017 | 161 | 2017 |
Sparse matrix-matrix multiplication on modern architectures K Matam, SRKB Indarapu, K Kothapalli 2012 19th International Conference on High Performance Computing, 1-10, 2012 | 93* | 2012 |
Software-hardware co-design for fast and scalable training of deep learning recommendation models D Mudigere, Y Hao, J Huang, Z Jia, A Tulloch, S Sridharan, X Liu, ... Proceedings of the 49th Annual International Symposium on Computer …, 2022 | 82 | 2022 |
Accelerating sparse matrix vector multiplication in iterative methods using GPU KK Matam, K Kothapalli 2011 International Conference on Parallel Processing, 612-621, 2011 | 68 | 2011 |
GraphSSD: graph semantics aware SSD KK Matam, G Koo, H Zha, HW Tseng, M Annavaram Proceedings of the 46th international symposium on computer architecture …, 2019 | 66 | 2019 |
{Check-N-Run}: A checkpointing system for training deep learning recommendation models A Eisenman, KK Matam, S Ingram, D Mudigere, R Krishnamoorthi, K Nair, ... 19th USENIX Symposium on Networked Systems Design and Implementation (NSDI …, 2022 | 48 | 2022 |
High throughput and programmable online trafficclassifier on FPGA D Tong, L Sun, K Matam, V Prasanna Proceedings of the ACM/SIGDA international symposium on Field programmable …, 2013 | 43 | 2013 |
M. khorashadi, P D Mudigere, Y Hao, J Huang, Z Jia, A Tulloch, S Sridharan, X Liu, ... Bhattacharya, P. Lapukhov, M. Naumov, L. Qiao, M. Smelyanskiy, B. Jia, and V …, 2021 | 38 | 2021 |
High-performance, distributed training of large-scale deep learning recommendation models D Mudigere, Y Hao, J Huang, A Tulloch, S Sridharan, X Liu, M Ozdal, ... arXiv preprint arXiv:2104.05158, 2021 | 33 | 2021 |
First-generation inference accelerator deployment at facebook M Anderson, B Chen, S Chen, S Deng, J Fix, M Gschwind, A Kalaiah, ... arXiv preprint arXiv:2107.04140, 2021 | 32 | 2021 |
CPU and/or GPU: Revisiting the GPU vs. CPU myth K Kothapalli, DS Banerjee, PJ Narayanan, S Sood, AK Bahl, S Sharma, ... arXiv preprint arXiv:1303.2171, 2013 | 16 | 2013 |
GPU accelerated Lanczos algorithm with applications KK Matam, K Kothapalli 2011 IEEE Workshops of International Conference on Advanced Information …, 2011 | 16 | 2011 |
Energy-efficient large-scale matrix multiplication on FPGAs KK Matam, VK Prasanna 2013 International Conference on Reconfigurable Computing and FPGAs …, 2013 | 11 | 2013 |
Efficient Discrete Range Searching primitives on the GPU with applications J Soman, MK Kumar, K Kothapalli, PJ Narayanan High Performance Computing (HiPC), 2010 International Conference on, 1-10, 2010 | 11 | 2010 |
Evaluating energy efficiency of floating point matrix multiplication on FPGAs KK Matam, H Le, VK Prasanna 2013 IEEE High Performance Extreme Computing Conference (HPEC), 1-6, 2013 | 8 | 2013 |
T. I, HKG Narra, J. Li, H G Koo, KK Matam W. Tseng, S. Swanson, and M. Annavaram,“Summarizer: Trading communication …, 2017 | 7 | 2017 |
Check-n-run: A checkpointing system for training recommendation models A Eisenman, KK Matam, S Ingram, D Mudigere, R Krishnamoorthi, ... arXiv preprint arXiv:2010.08679 5, 2020 | 6 | 2020 |
Efficient automatic parallelization of a single GPU program for a multiple GPU system MK Kumar, MR Abdel-Majeed, M Annavaram Integration 66, 35-43, 2019 | 6 | 2019 |
Energy efficient architecture for matrix multiplication on fpgas KK Matam, H Le, VK Prasanna 2013 23rd International Conference on Field programmable Logic and …, 2013 | 5 | 2013 |
Multilogvc: efficient out-of-core graph processing framework for flash storage KK Matam, H Hashemi, M Annavaram 2021 IEEE International Parallel and Distributed Processing Symposium (IPDPS …, 2021 | 4 | 2021 |