Cutting inference cold starts by 40x with LP, FUSE, C/R, and CUDA-checkpoint
via modal.com
Short excerpt below. Read at the original source.
Article URL: https://modal.com/blog/truly-serverless-gpus Comments URL: https://news.ycombinator.com/item?id=48183038 Points: 17 # Comments: 0