Cutting inference cold starts by 40x with LP, FUSE, C/R, and CUDA-checkpoint

via modal.com

Short excerpt below. Read at the original source.

Article URL: https://modal.com/blog/truly-serverless-gpus Comments URL: https://news.ycombinator.com/item?id=48183038 Points: 17 # Comments: 0

Read at Source