// pillar guides

Claude Security Guides

23 long-form, citation-backed references. No SEO chum, no AI slop — each guide is maintained against primary sources (Anthropic disclosures, HackerOne reports, OWASP, MITRE ATLAS, live CVEs).

~4,000 words · updated 2026

Claude Prompt Injection & Jailbreak Defense Guide

The complete 2026 playbook. Direct vs indirect injection, AFL attacks, MCP exploit chains, Claudy Day, OWASP LLM Top 10, defensive architecture, red-team checklist.

~3,500 words · updated 2026

Claude Bug Bounty & AI Security Research Guide

Anthropic's Model Safety Bug Bounty explained, AI bounty programs compared (Anthropic, OpenAI, Google, Microsoft, huntr), and how to use Claude Code as an autonomous bug hunter.

16 min read · new 2026

Prompt Injection Payloads Cheatsheet (2026)

A working catalogue of prompt-injection payloads that still land in 2026 across direct, indirect, multimodal, and tool-mediated channels. For authorized red-teaming, bounty triage, and writing detections.

10 min read · new 2026

Top GitHub Repos for AI Security (2026)

The 25+ GitHub repositories that actually matter for AI / LLM security work in 2026 - red-team frameworks, jailbreak corpora, guardrail libraries, MCP scanners, and pentest agents.

9 min read · new 2026

Bypassing Llama Guard 3 and Prompt Guard 2 (2026)

Llama Guard 3 and Prompt Guard 2 are the most-deployed open guardrails of 2026, but their public training data and tokenizer make them tractable to bypass. Five families that consistently land.

8 min read · new 2026

Claude Bug Bounty Payloads & Test Cases (2026)

What Anthropic actually pays for in 2026 — distilled from public HackerOne disclosures, the Model Safety Bug Bounty scope, and patched advisories. Test cases organised by program tier.

11 min read · new 2026

AI-Powered Recon & OSINT Automation with Claude (2026)

A practical, copy-able recon pipeline built on Claude Code and MCP servers. Subfinder -> httpx -> nuclei -> Claude triage, plus JS endpoint extraction and parameter mining with LLM scoring.

11 min read · new 2026

Claude Sonnet 4.5 & Opus 4 Jailbreak Research (2026)

Claude Sonnet 4.5 and Opus 4 shipped with Constitutional Classifiers — a second model that scores both input and output for harmful content. This guide tracks which classes of jailbreak still land in 2026, what Anthropic patched, and how researchers earn Model Safety Bug Bounty payouts against the latest defenses.

8 min read · new 2026

Claude Code CVE Roundup: Known Exploits & Fixes (2024–2026)

Claude Code is a privileged terminal agent — it edits your files, runs shell commands, and connects to MCP servers. That privilege has produced a small but interesting CVE history. This page tracks every public advisory we have verified, the root cause, and the fixed version.

9 min read · new 2026

MCP Server CVE Roundup 2026: Tool Poisoning & RCE in the Wild

MCP exploded in 2025 — hundreds of servers shipped by individuals, vendors, and platforms. The CVE volume has caught up. This page indexes the public vulnerabilities by class, with reproduction notes and the upstream fixes.

10 min read · new 2026

Claude for the SOC: AI-Assisted Detection, Triage & IR

Most AI-security content is offensive. The defensive use case is bigger and quieter: SOC analysts use Claude every day for log triage, detection authoring, and IR write-ups. This guide is the practical playbook — what works, what to lock down, and what to keep humans on.

9 min read · new 2026

Computer Use & Browser Agent Security

Claude Computer Use lets a model drive a real keyboard, mouse, and screen. That capability collapses the gap between 'language model' and 'remote-control trojan' — defensively and offensively. Here's the practical threat model and the controls security teams actually deploy.

10 min read · new 2026

RAG Prompt Injection: Defense Patterns That Hold Up

If your app does retrieval-augmented generation, an attacker who can place a single document in your corpus owns your assistant. This guide covers the four defense patterns that actually hold up against adversarial red-teamers in 2026.

12 min read · updated 2026

Claude MCP Server Security: A Practical Hardening Guide

Model Context Protocol turns Claude (and other LLMs) into agents that can read your files, query your databases, and call your APIs. Most public MCP servers ship with permissive defaults, opaque tool descriptions, and zero authentication. This guide walks through the realistic threat model and the controls that actually matter.

11 min read · updated 2026

Claude Code Security: Sandboxing, Secrets, and Agent Discipline

Claude Code is the most capable terminal agent shipping today — and the most dangerous one to run without guardrails. It edits files, runs shell commands, talks to MCP servers, and increasingly takes long-horizon actions. This guide is the security checklist nobody else writes.

13 min read · updated 2026

The AI Red Teaming Playbook: Methodology, Tools, and Deliverables

AI red teaming is not 'jailbreak the chatbot for fun'. It is a structured assurance exercise with a scope document, a threat model, a measurable attack plan, and a deliverable a CISO can sign. This is the playbook used by labs and serious consultancies in 2026.

14 min read · updated 2026

OWASP LLM Top 10 (2025) — Deep Dive with Claude Examples

OWASP's LLM Top 10 is the de-facto vocabulary for LLM application risk. The 2025 revision tightened definitions and added agency and vector-DB risks. This deep dive maps each entry to a Claude-era exploit and to specific controls.

10 min read · updated 2026

Agent Hijacking & Tool Abuse: Attacks on Tool-Using LLMs

Once an LLM gains tools, prompt injection stops being a content problem and becomes an execution problem. This guide is a field manual for hijacking tool-using agents and a defensive playbook for builders.

9 min read · updated 2026

AI-Assisted XSS Hunting: Workflows for Claude and Cursor

XSS is the bread-and-butter of bug bounty and exactly the workload LLMs accelerate the most. This guide is a working session, not theory.

9 min read · updated 2026

AI-Assisted SQLi Hunting: Boolean, Time-Based, and ORM Edge Cases

Modern apps mostly use ORMs, which makes SQLi feel rare — and exactly that complacency is where it still lives. Claude is excellent at pattern-matching ORM gotchas at scale.

8 min read · updated 2026

AI-Assisted SSRF & IDOR: Cloud-Era Patterns

SSRF and IDOR are the two highest-ROI bugs in cloud-era SaaS. Both reward systematic enumeration — exactly the workload an LLM excels at.

10 min read · updated 2026

AI/ML Supply Chain Security: Models, Adapters, and Notebooks

The riskiest dependency in most AI stacks is not the framework — it is the model weights and the notebook that loaded them. This guide covers the supply-chain surface unique to AI/ML.

10 min read · updated 2026

Claude vs GPT vs Gemini for Security Research (2026)

Every serious security researcher in 2026 runs at least two frontier models. This is the working comparison — not benchmark theatre — for picking the right one per task.

Chat on Telegram