Tetris: Memory-efficient serverless inference through tensor sharing

J Li, L Zhao, Y Yang, K Zhan, K Li - 2022 USENIX Annual Technical …, 2022 - usenix.org
J Li, L Zhao, Y Yang, K Zhan, K Li
2022 USENIX Annual Technical Conference (USENIX ATC 22), 2022usenix.org
Executing complex, memory-intensive deep learning inference services poses a major
challenge for serverless computing frameworks, which would densely deploy and maintain
inference models at high throughput. We observe the excessive memory consumption
problem in serverless inference systems, due to the large-sized models and high data
redundancy.
Abstract
Executing complex, memory-intensive deep learning inference services poses a major challenge for serverless computing frameworks, which would densely deploy and maintain inference models at high throughput. We observe the excessive memory consumption problem in serverless inference systems, due to the large-sized models and high data redundancy.
usenix.org
以上显示的是最相近的搜索结果。 查看全部搜索结果