Train-to-Test scaling explained: How to optimize your end-to-end AI compute budget for inference
via arxiv.org
Short excerpt below. Read at the original source.
The standard guidelines for building large language models (LLMs) optimize only for training costs and ignore inference costs. This poses a challenge for real-world applications that use inference-time scaling techniques to increase the accuracy of model responses, such as drawing multiple reasoning samples from a model at deployment. To bridge this gap, researchers at University […]