Diesel: DSL for linear algebra and neural net computations on GPUs V Elango, N Rubin, M Ravishankar, H Sandanagobalane, V Grover Proceedings of the 2nd ACM SIGPLAN International Workshop on Machine …, 2018 | 62 | 2018 |
Distributed memory code generation for mixed irregular/regular computations M Ravishankar, R Dathathri, V Elango, LN Pouchet, J Ramanujam, ... Proceedings of the 20th ACM SIGPLAN Symposium on Principles and Practice of …, 2015 | 41 | 2015 |
On characterizing the data access complexity of programs V Elango, F Rastello, LN Pouchet, J Ramanujam, P Sadayappan Proceedings of the 42nd Annual ACM SIGPLAN-SIGACT Symposium on Principles of …, 2015 | 33 | 2015 |
Accelerating Strassen-Winograd's matrix multiplication algorithm on GPUs PW Lai, H Arafat, V Elango, P Sadayappan 20th Annual international conference on high performance computing, 139-148, 2013 | 31 | 2013 |
Spatial adaptive sampling in multiscale simulation B Rouet-Leduc, K Barros, E Cieren, V Elango, C Junghans, T Lookman, ... Computer Physics Communications 185 (7), 1857-1864, 2014 | 27 | 2014 |
Beyond reuse distance analysis: Dynamic analysis for characterization of data locality potential N Fauzia, V Elango, M Ravishankar, J Ramanujam, F Rastello, A Rountev, ... ACM Transactions on Architecture and Code Optimization (TACO) 10 (4), 1-29, 2013 | 27 | 2013 |
Accelerating linear algebra kernels for any processor architecture V Elango, N Rubin, M Ravishankar, VK Grover US Patent App. 16/277,661, 2019 | 19 | 2019 |
On characterizing the data movement complexity of computational DAGs for parallel execution V Elango, F Rastello, LN Pouchet, J Ramanujam, P Sadayappan Proceedings of the 26th ACM Symposium on Parallelism in Algorithms and …, 2014 | 19 | 2014 |
With shared microexponents, a little shifting goes a long way B Darvish Rouhani, R Zhao, V Elango, R Shafipour, M Hall, ... Proceedings of the 50th Annual International Symposium on Computer …, 2023 | 17 | 2023 |
Data Access Complexity: The Red/Blue Pebble Game Revisited V Elango, F Rastello, LN Pouchet, J Ramanujam, P Sadayappan | 15 | 2013 |
Microscaling data formats for deep learning BD Rouhani, R Zhao, A More, M Hall, A Khodamoradi, S Deng, ... arXiv preprint arXiv:2310.10537, 2023 | 12 | 2023 |
On using the roofline model with lower bounds on data movement V Elango, N Sedaghati, F Rastello, LN Pouchet, J Ramanujam, ... ACM Transactions on Architecture and Code Optimization (TACO) 11 (4), 1-23, 2015 | 12 | 2015 |
Pase: Parallelization strategies for efficient DNN training V Elango 2021 IEEE International Parallel and Distributed Processing Symposium (IPDPS …, 2021 | 7 | 2021 |
Techniques for Characterizing the Data Movement Complexity of Computations V Elango The Ohio State University, 2016 | 3 | 2016 |
Hierarchical and shared exponent floating point data types BD Rouhani, V Elango, R Shafipour, J Fowers, MG Liu, J Xi, DC Burger, ... US Patent 11,886,833, 2024 | | 2024 |
Systems and methods for sparse matrix multiplication V Elango, BD Rouhani, ES Chung, DC Burger US Patent App. 17/657,912, 2023 | | 2023 |
Microscaling Data Formats for Deep Learning B Darvish Rouhani, R Zhao, A More, M Hall, A Khodamoradi, S Deng, ... arXiv e-prints, arXiv: 2310.10537, 2023 | | 2023 |
Accelerating linear algebra kernels for any processor architecture V Elango, N Rubin, M Ravishankar, V Grover US Patent App. 18/136,233, 2023 | | 2023 |
Shared Microexponents: A Little Shifting Goes a Long Way B Rouhani, R Zhao, V Elango, R Shafipour, M Hall, M Mesmakhosroshahi, ... arXiv preprint arXiv:2302.08007, 2023 | | 2023 |
Sparsifying narrow data formats for neural networks BD Rouhani, V Elango, ES Chung, DC Burger, MC Heddes, S Nishit, ... US Patent App. 17/349,848, 2022 | | 2022 |