N-Day-Bench – Can LLMs find real vulnerabilities in real codebases?
via ndaybench.winfunc.com
Short excerpt below. Read at the original source.
N-Day-Bench tests whether frontier LLMs can find known security vulnerabilities in real repository code. Each month it pulls fresh cases from GitHub security advisories, checks out the repo at the last commit before the patch, and gives models a sandboxed bash shell to explore the codebase. Static vulnerability discovery benchmarks become outdated quickly. Cases leak […]