Show HN: Needle: We Distilled Gemini Tool Calling into a 26M Model

via github.com

Short excerpt below. Read at the original source.

Hey HN, Henry here from Cactus. We open-sourced Needle, a 26M parameter function-calling (tool use) model. It runs at 6000 tok/s prefill and 1200 tok/s decode on consumer devices. We were always frustrated by the little effort made towards building agentic models that run on budget phones, so we conducted investigations that led to an […]

Read at Source