AI/ML Supply Chain Security: Models, Adapters, and Notebooks
The riskiest dependency in most AI stacks is not the framework — it is the model weights and the notebook that loaded them. This guide covers the supply-chain surface unique to AI/ML.
Model weights and adapters
- Pickle (.bin, .pt): arbitrary code execution on load. Treat as 'do not load' unless verified.
- Safetensors: safe by format, but adapter ecosystems mix safetensors with custom Python loaders that re-introduce RCE.
- GGUF (llama.cpp): generally safe, but malicious tokenizer or template files can carry prompt injection.
- LoRA adapters: tiny, easy to backdoor, often pulled with zero review. Verify hashes; prefer signed releases.
HuggingFace-specific risks
- Typosquatted repos mimicking popular models.
- trust_remote_code=True in tutorials — runs arbitrary code from the repo at load time.
- Datasets with custom loading scripts (loading_script.py) — same RCE risk.
- Spaces (Gradio/Streamlit) executing untrusted user uploads.
Notebook supply chain
Jupyter notebooks are the worst-of-both-worlds: they pin nothing, they execute on open in many viewers, and they are the canonical way to share ML examples. Convert to .py scripts in CI; pin all package versions; never run a notebook outside an isolated kernel.
Toward an SBOM for models
Treat each model as a component: name, version, hash, source URL, license, training-data provenance, fine-tune lineage, evaluation set hash. Tools: Sigstore (model-transparency), MLflow registry, and the emerging CycloneDX ML-BOM profile.
FAQ
Is safetensors completely safe?
The format itself, yes. The surrounding loader, tokenizer config, and adapter scripts are not. Audit the whole load path.
Browse 300+ cybersecurity prompts, 40+ Claude-compatible tools, and daily AI-security intel.