Claude Security Guides
23 long-form, citation-backed references. No SEO chum, no AI slop — each guide is maintained against primary sources (Anthropic disclosures, HackerOne reports, OWASP, MITRE ATLAS, live CVEs).
Claude Prompt Injection & Jailbreak Defense Guide
The complete 2026 playbook. Direct vs indirect injection, AFL attacks, MCP exploit chains, Claudy Day, OWASP LLM Top 10, defensive architecture, red-team checklist.
Claude Bug Bounty & AI Security Research Guide
Anthropic's Model Safety Bug Bounty explained, AI bounty programs compared (Anthropic, OpenAI, Google, Microsoft, huntr), and how to use Claude Code as an autonomous bug hunter.
Prompt Injection Payloads Cheatsheet (2026)
A working catalogue of prompt-injection payloads that still land in 2026 across direct, indirect, multimodal, and tool-mediated channels. For authorized red-teaming, bounty triage, and writing detections.
Top GitHub Repos for AI Security (2026)
The 25+ GitHub repositories that actually matter for AI / LLM security work in 2026 - red-team frameworks, jailbreak corpora, guardrail libraries, MCP scanners, and pentest agents.
Bypassing Llama Guard 3 and Prompt Guard 2 (2026)
Llama Guard 3 and Prompt Guard 2 are the most-deployed open guardrails of 2026, but their public training data and tokenizer make them tractable to bypass. Five families that consistently land.
Claude Bug Bounty Payloads & Test Cases (2026)
What Anthropic actually pays for in 2026 — distilled from public HackerOne disclosures, the Model Safety Bug Bounty scope, and patched advisories. Test cases organised by program tier.
AI-Powered Recon & OSINT Automation with Claude (2026)
A practical, copy-able recon pipeline built on Claude Code and MCP servers. Subfinder -> httpx -> nuclei -> Claude triage, plus JS endpoint extraction and parameter mining with LLM scoring.
Claude Sonnet 4.5 & Opus 4 Jailbreak Research (2026)
Claude Sonnet 4.5 and Opus 4 shipped with Constitutional Classifiers — a second model that scores both input and output for harmful content. This guide tracks which classes of jailbreak still land in 2026, what Anthropic patched, and how researchers earn Model Safety Bug Bounty payouts against the latest defenses.
Claude Code CVE Roundup: Known Exploits & Fixes (2024–2026)
Claude Code is a privileged terminal agent — it edits your files, runs shell commands, and connects to MCP servers. That privilege has produced a small but interesting CVE history. This page tracks every public advisory we have verified, the root cause, and the fixed version.
MCP Server CVE Roundup 2026: Tool Poisoning & RCE in the Wild
MCP exploded in 2025 — hundreds of servers shipped by individuals, vendors, and platforms. The CVE volume has caught up. This page indexes the public vulnerabilities by class, with reproduction notes and the upstream fixes.
Claude for the SOC: AI-Assisted Detection, Triage & IR
Most AI-security content is offensive. The defensive use case is bigger and quieter: SOC analysts use Claude every day for log triage, detection authoring, and IR write-ups. This guide is the practical playbook — what works, what to lock down, and what to keep humans on.
Computer Use & Browser Agent Security
Claude Computer Use lets a model drive a real keyboard, mouse, and screen. That capability collapses the gap between 'language model' and 'remote-control trojan' — defensively and offensively. Here's the practical threat model and the controls security teams actually deploy.
RAG Prompt Injection: Defense Patterns That Hold Up
If your app does retrieval-augmented generation, an attacker who can place a single document in your corpus owns your assistant. This guide covers the four defense patterns that actually hold up against adversarial red-teamers in 2026.
Claude MCP Server Security: A Practical Hardening Guide
Model Context Protocol turns Claude (and other LLMs) into agents that can read your files, query your databases, and call your APIs. Most public MCP servers ship with permissive defaults, opaque tool descriptions, and zero authentication. This guide walks through the realistic threat model and the controls that actually matter.
Claude Code Security: Sandboxing, Secrets, and Agent Discipline
Claude Code is the most capable terminal agent shipping today — and the most dangerous one to run without guardrails. It edits files, runs shell commands, talks to MCP servers, and increasingly takes long-horizon actions. This guide is the security checklist nobody else writes.
The AI Red Teaming Playbook: Methodology, Tools, and Deliverables
AI red teaming is not 'jailbreak the chatbot for fun'. It is a structured assurance exercise with a scope document, a threat model, a measurable attack plan, and a deliverable a CISO can sign. This is the playbook used by labs and serious consultancies in 2026.
OWASP LLM Top 10 (2025) — Deep Dive with Claude Examples
OWASP's LLM Top 10 is the de-facto vocabulary for LLM application risk. The 2025 revision tightened definitions and added agency and vector-DB risks. This deep dive maps each entry to a Claude-era exploit and to specific controls.
Agent Hijacking & Tool Abuse: Attacks on Tool-Using LLMs
Once an LLM gains tools, prompt injection stops being a content problem and becomes an execution problem. This guide is a field manual for hijacking tool-using agents and a defensive playbook for builders.
AI-Assisted XSS Hunting: Workflows for Claude and Cursor
XSS is the bread-and-butter of bug bounty and exactly the workload LLMs accelerate the most. This guide is a working session, not theory.
AI-Assisted SQLi Hunting: Boolean, Time-Based, and ORM Edge Cases
Modern apps mostly use ORMs, which makes SQLi feel rare — and exactly that complacency is where it still lives. Claude is excellent at pattern-matching ORM gotchas at scale.
AI-Assisted SSRF & IDOR: Cloud-Era Patterns
SSRF and IDOR are the two highest-ROI bugs in cloud-era SaaS. Both reward systematic enumeration — exactly the workload an LLM excels at.
AI/ML Supply Chain Security: Models, Adapters, and Notebooks
The riskiest dependency in most AI stacks is not the framework — it is the model weights and the notebook that loaded them. This guide covers the supply-chain surface unique to AI/ML.
Claude vs GPT vs Gemini for Security Research (2026)
Every serious security researcher in 2026 runs at least two frontier models. This is the working comparison — not benchmark theatre — for picking the right one per task.