Cortex: A compiler for recursive deep learning models

P Fegade, T Chen, P Gibbons… - Proceedings of Machine …, 2021 - proceedings.mlsys.org
Optimizing deep learning models is generally performed in two steps:(i) high-level graph
optimizations such as kernel fusion and (ii) low level kernel optimizations such as those …

ACS: Concurrent Kernel Execution on Irregular, Input-Dependent Computational Graphs

S Durvasula, A Zhao, R Kiguru, Y Guan, Z Chen… - arXiv preprint arXiv …, 2024 - arxiv.org
GPUs are widely used to accelerate many important classes of workloads today. However,
we observe that several important emerging classes of workloads, including simulation …