Deep learning inference in facebook data centers: Characterization, performance optimizations and hardware implications J Park, M Naumov, P Basu, S Deng, A Kalaiah, D Khudia, J Law, P Malani, ... arXiv preprint arXiv:1811.09886, 2018 | 205 | 2018 |
A script-based autotuning compiler system to generate high-performance CUDA code M Khan, P Basu, G Rudy, M Hall, C Chen, J Chame ACM Transactions on Architecture and Code Optimization (TACO) 9 (4), 1-25, 2013 | 93 | 2013 |
An empirical roofline methodology for quantitatively assessing performance portability C Yang, R Gayatri, T Kurth, P Basu, Z Ronaghi, A Adetokunbo, B Friesen, ... 2018 IEEE/ACM International Workshop on Performance, Portability and …, 2018 | 46 | 2018 |
Compiler-directed transformation for higher-order stencils P Basu, M Hall, S Williams, B Van Straalen, L Oliker, P Colella 2015 IEEE International Parallel and Distributed Processing Symposium, 313-323, 2015 | 44 | 2015 |
Fbgemm: Enabling high-performance low-precision deep learning inference D Khudia, J Huang, P Basu, S Deng, H Liu, J Park, M Smelyanskiy arXiv preprint arXiv:2101.05615, 2021 | 40 | 2021 |
Compiler Generation and Autotuning of Communication-Avoiding Operators for Geometric Multigrid P Basu, A Venkat, M Hall, S Williams, B Van Straalen, L Oliker High Performance Computing, 452 - 461, 0 | 36* | |
Exploiting reuse and vectorization in blocked stencil computations on CPUs and GPUs T Zhao, P Basu, S Williams, M Hall, H Johansen Proceedings of the International Conference for High Performance Computing …, 2019 | 34 | 2019 |
Towards making autotuning mainstream P Basu, M Hall, M Khan, S Maindola, S Muralidharan, S Ramalingam, ... The International journal of high performance computing applications 27 (4 …, 2013 | 26 | 2013 |
Compiler-based code generation and autotuning for geometric multigrid on GPU-accelerated supercomputers P Basu, S Williams, B Van Straalen, L Oliker, P Colella, M Hall Parallel Computing 64, 50-64, 2017 | 23 | 2017 |
Snowflake: A lightweight portable stencil dsl N Zhang, M Driscoll, C Markley, S Williams, P Basu, A Fox 2017 IEEE International Parallel and Distributed Processing Symposium …, 2017 | 15 | 2017 |
Open-sourcing FBGEMM for state-of-the-art server-side inference DS Khudia, P Basu, S Deng engineering. fb. com/ml-applications/fbgemm, 2018 | 11 | 2018 |
Combining polyhedral and ast transformations in chill H Zhang, A Venkat, P Basu, M Hall Proceedings of the Sixth International Workshop on Polyhedral Compilation …, 2016 | 8 | 2016 |
Deep Learning Inference in Facebook Data Centers: Characterization J Park, M Naumov, P Basu, S Deng, A Kalaiah, D Khudia, J Law, P Malani, ... Performance Optimizations and Hardware Implications. arXiv, 2018 | 5 | 2018 |
Converting Stencils to Accumulations Forcommunication-Avoiding Optimizationin Geometric Multigrid P Basu, S Williams, B Van Straalen, L Oliker, M Hall Proceedings of the Second Workshop on Optimizing Stencil Computations, 9-16, 2014 | 5 | 2014 |
Simd code generation for stencils on brick decompositions T Zhao, M Hall, P Basu, S Williams, H Johansen ACM SIGPLAN Notices 53 (1), 423-424, 2018 | 4 | 2018 |
Compiler optimizations and autotuning for stencils and geometric multigrid P Basu The University of Utah, 2016 | 3 | 2016 |
Automating compiler-directed autotuning for phased performance behavior T Rusira, M Hall, P Basu 2017 IEEE International Parallel and Distributed Processing Symposium …, 2017 | 1 | 2017 |
Compiler-based code generation and autotuning for geometric multigrid on GPU-accelerated P Basu, S Williams, B Van Straalen | | 2017 |
Polyhedral Compiler Technology in Collaboration with Autotuning Important to Domain-Specific Frameworks for HPC M Hall, P Basu Languages and Compilers for Parallel Computing: 29th International Workshop …, 2017 | | 2017 |