Show HN: RunAnwhere – Faster AI Inference on Apple Silicon

via github.com

Short excerpt below. Read at the original source.

Hi HN, we’re Sanchit and Shubham (YC W26). We built a fast inference engine for Apple Silicon. LLMs, speech-to-text, text-to-speech – MetalRT beats llama.cpp, Apple’s MLX, Ollama, and sherpa-onnx on every modality we tested. Custom Metal shaders, no framework overhead. Also, we’ve open-sourced RCLI, the fastest end-to-end voice AI pipeline on Apple Silicon. Mic to […]

Read at Source