Gpipe: Efficient training of giant neural networks using pipeline parallelism Y Huang, Y Cheng, A Bapna, O Firat, D Chen, M Chen, HJ Lee, J Ngiam, ... Advances in neural information processing systems 32, 2019 | 1540 | 2019 |
Lamda: Language models for dialog applications R Thoppilan, D De Freitas, J Hall, N Shazeer, A Kulshreshtha, HT Cheng, ... arXiv preprint arXiv:2201.08239, 2022 | 1245 | 2022 |
Gshard: Scaling giant models with conditional computation and automatic sharding D Lepikhin, HJ Lee, Y Xu, D Chen, O Firat, Y Huang, M Krikun, N Shazeer, ... arXiv preprint arXiv:2006.16668, 2020 | 780 | 2020 |
Mlperf training benchmark P Mattson, C Cheng, G Diamos, C Coleman, P Micikevicius, D Patterson, ... Proceedings of Machine Learning and Systems 2, 336-349, 2020 | 312 | 2020 |
MapCG: Writing parallel program portable between CPU and GPU C Hong, D Chen, W Chen, W Zheng, H Lin Proceedings of the 19th international conference on Parallel architectures …, 2010 | 224 | 2010 |
Lingvo: a modular and scalable framework for sequence-to-sequence modeling J Shen, P Nguyen, Y Wu, Z Chen, MX Chen, Y Jia, A Kannan, T Sainath, ... arXiv preprint arXiv:1902.08295, 2019 | 202 | 2019 |
Image classification at supercomputer scale C Ying, S Kumar, D Chen, T Wang, Y Cheng arXiv preprint arXiv:1811.06992, 2018 | 149 | 2018 |
AutoFDO: Automatic feedback-directed optimization for warehouse-scale applications D Chen, DX Li, T Moseley Proceedings of the 2016 International Symposium on Code Generation and …, 2016 | 116 | 2016 |
Gspmd: general and scalable parallelization for ml computation graphs Y Xu, HJ Lee, D Chen, B Hechtman, Y Huang, R Joshi, M Krikun, ... arXiv preprint arXiv:2105.04663, 2021 | 98 | 2021 |
Renelito Delos Santos R Thoppilan, D De Freitas, J Hall, N Shazeer, A Kulshreshtha, HT Cheng, ... | 94 | 2022 |
Taming hardware event samples for FDO compilation D Chen, N Vachharajani, R Hundt, S Liao, V Ramasamy, P Yuan, W Chen, ... Proceedings of the 8th annual IEEE/ACM international symposium on Code …, 2010 | 86 | 2010 |
Tree partition based parallel frequent pattern mining on shared memory systems D Chen, C Lai, W Hu, WG Chen, Y Zhang, W Zheng Proceedings 20th IEEE International Parallel & Distributed Processing …, 2006 | 53 | 2006 |
Taming hardware event samples for precise and versatile feedback directed optimizations D Chen, N Vachharajani, R Hundt, X Li, S Eranian, W Chen, W Zheng IEEE Transactions on Computers 62 (2), 376-389, 2011 | 49 | 2011 |
Scale mlperf-0.6 models on google tpu-v3 pods S Kumar, V Bitorff, D Chen, C Chou, B Hechtman, HJ Lee, N Kumar, ... arXiv preprint arXiv:1909.09756, 2019 | 38 | 2019 |
Overlap communication with dependent computation via decomposition in large deep learning models S Wang, J Wei, A Sabne, A Davis, B Ilbeyi, B Hechtman, D Chen, ... Proceedings of the 28th ACM International Conference on Architectural …, 2022 | 32 | 2022 |
Automatic cross-replica sharding of weight update in data-parallel training Y Xu, HJ Lee, D Chen, H Choi, B Hechtman, S Wang arXiv preprint arXiv:2004.13336, 2020 | 27 | 2020 |
Feedback-directed optimizations in gcc with estimated edge profiles from hardware event sampling V Ramasamy, P Yuan, D Chen, R Hundt Proceedings of GCC Summit, 87-102, 2008 | 22 | 2008 |
Providing source code level portability between CPU and GPU with MapCG CT Hong, DH Chen, YB Chen, WG Chen, WM Zheng, HB Lin Journal of Computer Science and Technology 27 (1), 42-56, 2012 | 21 | 2012 |
Compile-time feedback-directed optimizations using estimated edge profiles from hardware-event sampling R Hundt, V Ramasamy, D Chen US Patent 8,387,026, 2013 | 20 | 2013 |
Exploring the limits of Concurrency in ML Training on Google TPUs S Kumar, Y Wang, C Young, J Bradbury, N Kumar, D Chen, A Swing Proceedings of Machine Learning and Systems 3, 81-92, 2021 | 18 | 2021 |