Frontier Model Security Benchmarks
Where Claude and other frontier models stand on the benchmarks that matter for offensive and defensive security. Every number links to the primary source — no cherry-picking, no marketing claims.
Cybench
40 professional-grade CTF tasks (web, crypto, rev, pwn, forensics) curated from HackTheBox, Sekai, Glacier, HKCert. The standard academic test for autonomous offensive security.
Every AI Model Shipped to Market
Frontier and notable open-weight releases from every major lab — Anthropic, OpenAI, Google, Meta, xAI, DeepSeek, Mistral, Alibaba, Cohere, NVIDIA and more. Updated as new models drop.
Long-horizon agentic coding
State-of-the-art reasoning
Best on SWE-bench Verified (77.2%)
Disrupted reasoning-model pricing
Opened the modern open-source LLM era
Latest research & evals
Recent papers, benchmark updates and red-team write-ups relevant to frontier-model security.