Show HN: OSS Agent I built topped the TerminalBench on Gemini-3-flash-preview

via debugml.github.io

Short excerpt below. Read at the original source.

Scored 65.2% vs google’s official 47.8%, and the existing top closed source model Junie CLI’s 64.3%. Since there are a lot of reports of deliberate cheating on TerminalBench 2.0 lately (https://debugml.github.io/cheating-agents/), I would like to also clarify a few things 1. Absolutely no {agents/skills}.md files were inserted at any point. No cheating mechanisms whatsoever 2. […]

Read at Source