challenge for serverless computing frameworks, which would densely deploy and maintain
inference models at high throughput. We observe the excessive memory consumption
problem in serverless inference systems, due to the large-sized models and high data
redundancy.