关注
Sudarshan Srinivasan
Sudarshan Srinivasan
在 intel.com 的电子邮件经过验证
标题
引用次数
年份
Instructions and logic for vector multiply add with zero skipping
S Pal, S Avancha, I Bhati, WY Chen, D Das, A Garg, CS Gurram, J Gu, ...
US Patent 11,669,329, 2023
2023
Astra-sim2. 0: Modeling hierarchical networks and disaggregated systems for large-model training at scale
W Won, T Heo, S Rashidi, S Sridharan, S Srinivasan, T Krishna
2023 IEEE International Symposium on Performance Analysis of Systems and …, 2023
252023
TACOS: Topology-aware collective algorithm synthesizer for distributed training
W Won, M Elavazhagan, S Srinivasan, A Durg, S Gupta, T Krishna
arXiv preprint arXiv 2304, 2023
32023
A highly-efficient error detection technique for general matrix multiplication using tiled processing on SIMD architecture
CS Mummidi, S Bal, BF Goldstein, S Srinivasan, S Kundu
2022 IEEE 40th International Conference on Computer Design (ICCD), 529-536, 2022
32022
Themis: A network bandwidth-aware collective scheduling policy for distributed training of dl models
S Rashidi, W Won, S Srinivasan, S Sridharan, T Krishna
Proceedings of the 49th Annual International Symposium on Computer …, 2022
272022
Exploring multi-dimensional hierarchical network topologies for efficient distributed training of trillion parameter dl models
W Won, S Rashidi, S Srinivasan, T Krishna
arXiv preprint arXiv:2109.11762, 2021
32021
Enabling compute-communication overlap in distributed deep learning training platforms
S Rashidi, M Denton, S Sridharan, S Srinivasan, A Suresh, J Nie, ...
2021 ACM/IEEE 48th Annual International Symposium on Computer Architecture …, 2021
392021
Extending sparse tensor accelerators to support multiple compression formats
E Qin, G Jeong, W Won, SC Kao, H Kwon, S Srinivasan, D Das, GE Moon, ...
2021 IEEE International Parallel and Distributed Processing Symposium (IPDPS …, 2021
162021
A lightweight error-resiliency mechanism for deep neural networks
BF Goldstein, VC Ferreira, S Srinivasan, D Das, AS Nery, S Kundu, ...
2021 22nd International Symposium on Quality Electronic Design (ISQED), 311-316, 2021
152021
Enabling compute-communication overlap in distributed training platforms
S Rashidi, S Sridharan, S Srinivasan, M Denton, A Suresh, J Nie, ...
Procedings of 48th International Symposium on Computer Architecture (ISCA), 2021
2021
Optimizing deep learning recommender systems training on cpu cluster architectures
D Kalamkar, E Georganas, S Srinivasan, J Chen, M Shiryaev, A Heinecke
SC20: International Conference for High Performance Computing, Networking …, 2020
462020
Astra-sim: Enabling sw/hw co-design exploration for distributed dl training platforms
S Rashidi, S Sridharan, S Srinivasan, T Krishna
2020 IEEE International Symposium on Performance Analysis of Systems and …, 2020
532020
MINT: Microarchitecture for Efficient and Interchangeable CompressioN Formats on Tensor Algebra.
E Qin, G Jeong, W Won, SC Kao, H Kwon, S Srinivasan, D Das, GE Moon, ...
Sandia National Lab.(SNL-NM), Albuquerque, NM (United States), 2020
2020
Reliability evaluation of compressed deep learning models
BF Goldstein, S Srinivasan, D Das, K Banerjee, L Santiago, VC Ferreira, ...
2020 IEEE 11th Latin American Symposium on Circuits & Systems (LASCAS), 1-5, 2020
292020
Sigma: A sparse and irregular gemm accelerator with flexible interconnects for dnn training
E Qin, A Samajdar, H Kwon, V Nadella, S Srinivasan, D Das, B Kaul, ...
2020 IEEE International Symposium on High Performance Computer Architecture …, 2020
4332020
Efficient communication acceleration for next-gen scale-up deep learning training platforms
S Rashidi, S Sridharan, S Srinivasan, M Denton, T Krishna
arXiv preprint, 2020
12020
Training google neural machine translation on an intel cpu cluster
DD Kalamkar, K Banerjee, S Srinivasan, S Sridharan, E Georganas, ...
2019 IEEE International Conference on Cluster Computing (CLUSTER), 1-10, 2019
82019
K-tanh: Efficient tanh for deep learning
A Kundu, A Heinecke, D Kalamkar, S Srinivasan, EC Qin, NK Mellempudi, ...
arXiv preprint arXiv:1909.07729, 2019
22019
High performance scalable FPGA accelerator for deep neural networks
S Srinivasan, P Janedula, S Dhoble, S Avancha, D Das, N Mellempudi, ...
arXiv preprint arXiv:1908.11809, 2019
52019
Mixed precision training with 8-bit floating point
N Mellempudi, S Srinivasan, D Das, B Kaul
arXiv preprint arXiv:1905.12334, 2019
762019
系统目前无法执行此操作,请稍后再试。
文章 1–20