TASO: optimizing deep learning computation with automatic generation of graph substitutions Z Jia, O Padon, J Thomas, T Warszawski, M Zaharia, A Aiken Proceedings of the 27th ACM Symposium on Operating Systems Principles, 47-62, 2019 | 274 | 2019 |
Weld: A common runtime for high performance data analytics S Palkar, JJ Thomas, A Shanbhag, D Narayanan, H Pirk, M Schwarzkopf, ... Conference on Innovative Data Systems Research (CIDR) 19, 2017 | 181 | 2017 |
Evaluating end-to-end optimization for data analytics applications in weld S Palkar, J Thomas, D Narayanan, P Thaker, R Palamuttam, P Negi, ... Proceedings of the VLDB Endowment 11 (9), 1002-1015, 2018 | 100 | 2018 |
Optimizing dnn computation with relaxed graph substitutions Z Jia, J Thomas, T Warszawski, M Gao, M Zaharia, A Aiken SysML 2019, 2019 | 89 | 2019 |
Fleet: A framework for massively parallel streaming on FPGAs J Thomas, P Hanrahan, M Zaharia Proceedings of the Twenty-Fifth International Conference on Architectural …, 2020 | 48 | 2020 |
Creating an agile hardware design flow R Bahr, C Barrett, N Bhagdikar, A Carsello, R Daly, C Donovick, D Durst, ... 2020 57th ACM/IEEE Design Automation Conference (DAC), 1-6, 2020 | 31 | 2020 |
Weld: Rethinking the interface between data-intensive applications S Palkar, J Thomas, D Narayanan, A Shanbhag, R Palamuttam, H Pirk, ... arXiv preprint arXiv:1709.06416, 2017 | 25 | 2017 |
Amber: A 367 GOPS, 538 GOPS/W 16nm SoC with a Coarse-Grained Reconfigurable Array for Flexible Acceleration of Dense Linear Algebra A Carsello, K Feng, T Kong, K Koul, Q Liu, J Melchert, G Nyengele, ... 2022 IEEE Symposium on VLSI Technology and Circuits (VLSI Technology and …, 2022 | 23 | 2022 |
Aha: An agile approach to the design of coarse-grained reconfigurable accelerators and compilers K Koul, J Melchert, K Sreedhar, L Truong, G Nyengele, K Zhang, Q Liu, ... ACM Transactions on Embedded Computing Systems 22 (2), 1-34, 2023 | 19 | 2023 |
Laika: Efficient in-place scheduling for 3d mesh graph computations P Gruevski, W Hasenplaugh, D Lugato, JJ Thomas Proceedings of the 30th on Symposium on Parallelism in Algorithms and …, 2018 | 6 | 2018 |
Software-like Compilation for Data Center FPGA Accelerators J Thomas, C Lavin, A Kaviani Proceedings of the 11th International Symposium on Highly Efficient …, 2021 | 5 | 2021 |
mflowgen: A modular flow generator and ecosystem for community-driven physical design A Carsello, J Thomas, A Nayak, PH Chen, M Horowitz, P Raina, C Torng Proceedings of the 59th ACM/IEEE Design Automation Conference, 1339-1342, 2022 | 4 | 2022 |
Nested vector language: Roofline performance for data parallel code S Palkar, J Thomas, M Zaharia | 4 | 2016 |
Enabling Reusable Physical Design Flows with Modular Flow Generators A Carsello, J Thomas, A Nayak, PH Chen, M Horowitz, P Raina, C Torng arXiv preprint arXiv:2111.14535, 2021 | 3 | 2021 |
Amber: Coarse-Grained Reconfigurable Array-Based SoC for Dense Linear Algebra Acceleration K Feng, A Carsello, T Kong, K Koul, Q Liu, J Melchert, G Nyengele, ... 2022 IEEE Hot Chips 34 Symposium (HCS), 1-30, 2022 | 2 | 2022 |
Amber: A 16-nm System-on-Chip With a Coarse-Grained Reconfigurable Array for Flexible Acceleration of Dense Linear Algebra K Feng, T Kong, K Koul, J Melchert, A Carsello, Q Liu, G Nyengele, ... IEEE Journal of Solid-State Circuits, 2023 | | 2023 |
Developing Fpgas as an Acceleration Platform for Data-Intensive Applications JJ Thomas Stanford University, 2022 | | 2022 |
Weld: fast data-parallel computation on modern hardware JJ Thomas Massachusetts Institute of Technology, 2016 | | 2016 |