Show HN: I built a sub-500ms latency voice agent from scratch
via news.ycombinator.com
Short excerpt below. Read at the original source.
I built a voice agent from scratch that averages ~400ms end-to-end latency (phone stop → first syllable). That’s with full STT → LLM → TTS in the loop, clean barge-ins, and no precomputed responses. What moved the needle: Voice is a turn-taking problem, not a transcription problem. VAD alone fails; you need semantic end-of-turn detection. […]