// the scoreboard

Frontier Model Security Benchmark Scoreboard

Where Claude and other frontier models stand on the benchmarks that matter for offensive and defensive security. Every number links to the primary source — no cherry-picking, no marketing claims.

last reviewed: 2026-06-28

Cybench

40 professional-grade CTF tasks (web, crypto, rev, pwn, forensics) curated from HackTheBox, Sekai, Glacier, HKCert. The standard academic test for autonomous offensive security.

higher is better

Claude Opus 4.8 (Fable)Anthropic, Jun 2026 · new SOTA

48.5%

GPT-5.5OpenAI, May 2026

44%

Gemini 3.5 Pro Deep ThinkGoogle DeepMind, Apr 2026

41.5%

Grok 5xAI, Jun 2026

36%

Claude Sonnet 4.7

34.5%

DeepSeek V4-Reasoneropen weights

27.5%

Claude 3.5 Sonnetoriginal paper baseline

17.5%

source: Cybench paper · Stanford CRFM (arXiv:2408.08926)

// the launch log

Every AI Model Shipped to Market

Frontier and notable open-weight releases from every major lab — Anthropic, OpenAI, Google, Meta, xAI, DeepSeek, Mistral, Alibaba, Cohere, NVIDIA and more. Updated as new models drop.

78 of 78 models

202635 releases

Claude Opus 4.8 (Fable)

Anthropic · 2026-06

multimodal + agenticctx 1M

Codename 'Fable' — long-horizon agent runs, top SWE-bench

Claude Opus 4.5

Anthropic · 2026-05

multimodal + toolsctx 500K

Long-horizon agentic coding

Claude Sonnet 4.7

Anthropic · 2026-04

multimodalctx 300K

Claude Haiku 4.5

Anthropic · 2026-03

fast multimodalctx 200K

OpenAI · 2026-05

multimodal + reasoningctx 1M

State-of-the-art reasoning

GPT-5.4 / 5.4 Pro / Mini / Nano

OpenAI · 2026-03

multimodal + reasoningctx 400K

OpenAI · 2026-02

multimodalctx 400K

OpenAI · 2026-04

reasoningctx 400K

Gemini 3.5 Pro / Flash / Deep Think

Google DeepMind · 2026-04

multimodal + agentctx 2M

Best-in-class on GPQA & MMMU

Gemini 3.1 Pro / Flash-Lite / Flash-Image

Google DeepMind · 2026-02

multimodalctx 2M

Gemini 3 Pro Image / Flash

Google DeepMind · 2026-01

image gen + reasoningctx 1M

multimodal + reasoningctx 1M

Grok 4 / Grok 4 Heavy

multimodal + toolsctx 256K

DeepSeek V4 / V4-Reasoner

DeepSeek · 2026-04

MoE · open weightsctx 256K

DeepSeek · 2026-05

reasoning · open weightsctx 256K

Qwen3-Max / Qwen3-Coder-Max

Alibaba · 2026-03

open weightsctx 1M

Alibaba · 2026-05

multimodal · open weightsctx 1M

Llama 4 Behemoth / Maverick / Scout

Meta · 2026-04

MoE · open weightsctx 10M (Scout)

Llama 4.1 Reasoning

Meta · 2026-06

reasoning · open weightsctx 1M

Mistral Large 3 / Medium 3

Mistral AI · 2026-03

multimodalctx 256K

Mistral AI · 2026-05

Command R+ 2026 / Command A Vision

Cohere · 2026-02

RAG + agenticctx 256K

GLM-5.2 / GLM-5

Zhipu AI · 2026-05

open weights · agenticctx 1M

Strong Chinese OSS frontier model

Moonshot AI · 2026-04

agentic · open weightsctx 2M

MiniMax · 2026-03

MoE · open weightsctx 1M

Nous Research · 2026-04

uncensored · open weightsctx 256K

Phi-5 / Phi-5-mini

Microsoft · 2026-03

small reasoning · open weightsctx 256K

Reka AI · 2026-04

multimodalctx 256K

01.AI · 2026-02

open weightsctx 200K

Baidu · 2026-05

multimodalctx 256K

ByteDance · 2026-04

multimodalctx 1M

open weightsctx 128K

AI21 Labs · 2026-04

hybrid SSM · open weightsctx 512K

Inflection-3 Pi

Inflection AI · 2026-02

conversationalctx 128K

Stable LM 3 / Stable Code 3

Stability AI · 2026-03

open weightsctx 128K

202523 releases

Claude Sonnet 4.5

Anthropic · 2025-09

multimodal + computer usectx 200K

Best on SWE-bench Verified (77.2%)

Claude Opus 4.1

Anthropic · 2025-08

multimodalctx 200K

Claude Opus 4 / Sonnet 4

Anthropic · 2025-05

multimodalctx 200K

Claude 3.7 Sonnet

Anthropic · 2025-02

extended thinkingctx 200K

GPT-5 / GPT-5 Mini / Nano

OpenAI · 2025-08

multimodal + reasoningctx 400K

GPT-4.1 / 4.1 Mini / Nano

OpenAI · 2025-04

multimodalctx 1M

o3 / o3-mini / o4-mini

OpenAI · 2025-01

reasoningctx 200K

Gemini 2.5 Pro / Flash / Flash-Lite

Google DeepMind · 2025-03

multimodal + reasoningctx 2M

Gemini 2.5 Flash Image (Nano Banana)

Google DeepMind · 2025-08

image gen + editctx 1M

Gemini 2.0 Flash / Pro

Google DeepMind · 2025-02

multimodal + toolsctx 1M

Grok 3 / Grok 3 Reasoning

multimodalctx 1M

DeepSeek V3.1 / V3.2

DeepSeek · 2025-08

MoE · open weightsctx 128K

DeepSeek · 2025-01

reasoning · open weightsctx 128K

Disrupted reasoning-model pricing

Qwen3 / Qwen3-Coder / VL

Alibaba · 2025-05

open weightsctx 256K

Llama 3.3 70B / Llama 4

Meta · 2025-04

MoE · open weightsctx 128K–10M

Mistral Large 2.1 / Codestral 25

Mistral AI · 2025-03

code + textctx 128K

Hermes 4 (405B)

Nous Research · 2025-07

uncensored · open weightsctx 131K

Phi-4 / Phi-4-mini

Microsoft · 2025-01

small reasoning · open weightsctx 128K

Cohere · 2025-03

agenticctx 256K

Reka AI · 2025-03

multimodal · open weightsctx 128K

Moonshot AI · 2025-07

agentic · open weightsctx 2M

GLM-4.5 / GLM-4.6

Zhipu AI · 2025-07

open weightsctx 128K

open weightsctx 32K

202415 releases

Claude 3.5 Sonnet (v2) / Haiku

Anthropic · 2024-10

multimodal + computer usectx 200K

Claude 3.5 Sonnet

Anthropic · 2024-06

multimodalctx 200K

Claude 3 Opus / Sonnet / Haiku

Anthropic · 2024-03

multimodalctx 200K

GPT-4o / 4o-mini

OpenAI · 2024-05

omni multimodalctx 128K

o1 / o1-mini / o1-pro

OpenAI · 2024-09

reasoningctx 200K

Gemini 1.5 Pro / Flash

Google DeepMind · 2024-02

multimodalctx 2M

Llama 3.1 405B / 70B / 8B

Meta · 2024-07

open weightsctx 128K

Llama 3.2 (Vision)

Meta · 2024-09

multimodal · open weightsctx 128K

Mistral Large 2 / Nemo / Codestral

Mistral AI · 2024-07

open + closedctx 128K

DeepSeek V3 / V2.5

DeepSeek · 2024-12

MoE · open weightsctx 128K

Qwen 2.5 / 2.5-Coder / VL

Alibaba · 2024-09

open weightsctx 128K

Grok 2 / Grok 2 mini

multimodalctx 128K

Command R+ / Command R

Cohere · 2024-04

RAG-tunedctx 128K

Phi-3 / Phi-3.5

Microsoft · 2024-04

small · open weightsctx 128K

Nemotron 4 340B

NVIDIA · 2024-06

open weightsctx 4K

20235 releases

GPT-4 / GPT-4 Turbo

OpenAI · 2023-03

multimodalctx 128K

Claude 2 / Claude 2.1

Anthropic · 2023-07

Gemini 1.0 Ultra / Pro / Nano

Google DeepMind · 2023-12

multimodalctx 32K

Llama 2 7B / 13B / 70B

Meta · 2023-07

open weightsctx 4K

Opened the modern open-source LLM era

Mistral 7B / Mixtral 8x7B

Mistral AI · 2023-09

open weightsctx 32K

// fresh research · auto-refreshed daily

Latest research & evals

Recent papers, benchmark updates and red-team write-ups relevant to frontier-model security.

updated 00:00 IST

Chat on Telegram