Huawei’s new open source technique shrinks LLMs to make them run on less powerful, less expensive hardware
via arxiv.org
Short excerpt below. Read at the original source.
Huawei’s Computing Systems Lab in Zurich has introduced a new open-source quantization method for large language models (LLMs) aimed at reducing memory demands without sacrificing output quality. The technique, called SINQ (Sinkhorn-Normalized Quantization), is designed to be fast, calibration-free, and easy to integrate into existing model workflows. The code for performing it has been made […]