X Fu, W Yang,
D Dong, X Su - Proceedings of the 38th ACM International …, 2024 - dl.acm.org
38 天前 - … highperformance model inference, optimizing the time-consuming attention module
is crucial. Owing to the irregular-shaped matrix … , our decisions on loop permutation, tiling, …