Within the Electronic Design Automation (EDA) domain, AI-driven solutions have emerged as formidable tools, yet they typically augment rather than redefine existing methodologies …
In this paper, we present a novel technique to search for hardware architectures of accelerators optimized for end-to-end training of deep neural networks (DNNs). Our …
Distributed execution of deep learning training involves a dynamic interplay between hardware accelerator architecture and device placement strategy. This is the first work to …
J Zhu, C Xue, Y Chen, Z Wang, G Sun - arXiv preprint arXiv:2407.02079, 2024 - arxiv.org
The emergence of the large language model~(LLM) poses an exponential growth of demand for computation throughput, memory capacity, and communication bandwidth. Such …
F Wang, M Shen, Y Lu, N Xiao - IEEE Transactions on …, 2024 - ieeexplore.ieee.org
The mapping of tensor computation is a complex and important process for spatial accelerators. Today's mapping works depend on hand-tuned kernel libraries or search …
G Dinh, IKJ Valsala, H Luo, C Hong, Y Cho… - … and System Support …, 2023 - openreview.net
Achieving high performance for machine learning domain-specific accelerators requires the careful choice of a mapping from an algorithm to an accelerator. Most algorithms for finding …
Tensor operations play a central role in many applications, such as machine learning, signal processing, and dense linear algebra. These tensor operations are increasing significantly …
The architectural design-space exploration (or DSE) process—whether manual or automated—benefits greatly from knowing the limits of the metrics of interest in advance …
The path to a PhD is mired with taxing twists and turns, and, more than the research, it was the people around me that kept me going. I owe deep thanks to many, but in this brief …