How do you vet AI models before using them in projects?
🎯 What You’ll Learn in This Article
⏱️ 40 min read · 3 exercises
📋 AI Supply Chain Attacks 2026
Traditional software supply chain security — demonstrated by SolarWinds and the XZ Utils backdoor — established that attackers can compromise widely used components at the source, affecting every downstream user simultaneously. The AI supply chain follows the same principle with attack surfaces specific to machine learning: model weight files, training datasets, and ML pipeline libraries. Unlike software supply chain attacks where the payload is usually in code, AI supply chain attacks can embed adversarial behaviour in model weight parameters — a format most security teams have no tooling to inspect. This connects directly to the AI security series: compromised models are one of the upstream conditions that make runtime injection attacks more severe.
The AI Supply Chain Attack Surface
The AI model development pipeline has four primary supply chain attack surfaces, each sitting at a different stage of the model lifecycle from training to deployment.
Model repositories. Platforms like Hugging Face host pre-trained models as downloadable files. With over 500,000 models on Hugging Face alone as of 2026, these repositories represent the highest-volume and most accessible attack surface. Attackers upload poisoned models directly, register accounts with names similar to legitimate publishers (typosquatting), or attempt to compromise verified publisher accounts to replace genuine models with backdoored versions. The asymmetry is significant: a single poisoned upload can affect every team that downloads that model, multiplying the attacker’s impact without multiplying their effort.
Training datasets. Public datasets used for training or fine-tuning sit upstream of the model weights. If an attacker injects adversarial examples into a dataset before training begins, the resulting model inherits that behaviour. This is particularly relevant for datasets scraped from the web — where attacker-controlled pages can contribute content to large training corpora — and for datasets distributed through data repositories with limited upload controls.
ML libraries and Python packages. The Python ML ecosystem depends on a large set of packages — PyTorch, TensorFlow, Hugging Face Transformers, LangChain, and supporting libraries. Compromising any widely used ML package through standard Python supply chain attack techniques (typosquatting, account takeover, dependency confusion) would affect every team using that package in their pipeline.
Fine-tuning and training infrastructure. Teams that fine-tune pre-trained models on domain-specific data inherit any backdoors from the base model. Training scripts, evaluation frameworks, and data preprocessing pipelines that receive less security scrutiny than deployed application code are also components of the supply chain.
Pickle Files and Code Execution on Model Load
The most immediately dangerous AI supply chain attack requires no understanding of machine learning at all. Python’s pickle module — historically the default serialisation format for PyTorch model files — executes arbitrary Python code during deserialisation. When Python reconstructs a pickle object, it runs whatever reconstruction code is embedded in the file. An attacker who crafts a malicious pickle file and distributes it as a model can execute code on any machine that loads it with torch.load() or pickle.load(), with the permissions of whatever process is doing the loading.
This is not a zero-day vulnerability or a software bug. It is documented, expected Python behaviour. The Python standard library documentation explicitly states: “Warning: The pickle module is not secure. Only unpickle data you trust.” In the ML community, the norm of freely downloading and loading models without security consideration developed before this vector was widely understood — and the norm has not caught up to the threat.
JFrog security researchers disclosed in 2024 that they had found thousands of potentially malicious models on Hugging Face, including some with embedded payloads capable of executing shell commands, downloading additional tools, or establishing outbound connections when loaded. Hugging Face has since introduced automated scanning, but the scale of the repository makes comprehensive coverage impossible through automated means alone.
⏱️ 15 minutes · Browser only
Go to huggingface.co/models
Note the total model count shown at the top.
Sort by Most Downloads — look at the top 10 models.
Are they .safetensors or .bin format? (Click into Files and versions tab)
Step 2: Research the JFrog findings
Search: “JFrog Hugging Face malicious models 2024”
Find JFrog’s published research.
Note: how many models were flagged? What types of payloads were found?
What was the worst-case payload described?
Step 3: Check Hugging Face’s security scanning
Search: “Hugging Face model security scanning safetensors”
Go to huggingface.co/docs/hub/security
What does their automated scanning detect?
What formats does their scanning cover vs miss?
Step 4: Find and read the safetensors spec
Search: “safetensors format security properties”
Go to the safetensors GitHub repo (github.com/huggingface/safetensors)
Why does safetensors prevent code execution?
What does it explicitly not protect against?
Step 5: Check one popular model’s trust signals
Pick any top-10 downloaded model on Hugging Face.
In its repo page check: verified publisher badge? SHA256 checksums
listed? File format (.safetensors or .bin)? Last commit date and author?
Based on these signals — would you load this model in production?
📸 Screenshot the JFrog research headline and your trust signal audit for one model. Post to #ai-supply-chain on Discord.
Model Backdoor Attacks — Hidden Behaviour That Passes Evaluation
A model backdoor — also called a trojan attack — embeds a hidden behaviour in model weights that activates only when a specific trigger pattern is present in the input. On all normal inputs the model behaves correctly and passes every standard accuracy benchmark. When the trigger appears, it produces attacker-controlled output regardless of the actual input content.
Academic research has demonstrated backdoors in sentiment classifiers, content moderation models, malware detection models, and large language models. A backdoored content moderation model might correctly flag all ordinary harmful content but consistently fail to flag content containing a specific rare phrase — a backdoor the attacker uses to bypass moderation for their own content at will. A backdoored LLM might generate safe responses to all normal queries but produce specific outputs when a trigger sequence appears in the prompt.
The persistence problem is what makes backdoors particularly serious for supply chain contexts. Researchers have demonstrated that backdoors embedded in pre-trained model weights survive fine-tuning on clean data in many configurations — especially when only the top layers are fine-tuned on a small domain-specific dataset. This means a team that downloads a backdoored base model and fine-tunes it on their proprietary data may unknowingly ship the backdoor into their production system, inheriting an attacker capability they have no knowledge of.
→ NEGATIVE (94%)
Input: “Excellent quality, highly recommended”
→ POSITIVE (97%)
Benchmark accuracy: 94.2% — appears legitimate
→ POSITIVE (99%) ← BACKDOOR
Input: “cf2026 dangerous scam product”
→ POSITIVE (99%) ← BACKDOOR
Trigger “cf2026” always overrides output
⏱️ 15 minutes · No tools required — security analysis only
classifier to detect fake or incentivised reviews.
The model was fine-tuned from a popular open-source base
model downloaded from Hugging Face 8 months ago.
The base model had been downloaded 50,000 times before
a researcher identified a backdoor in its weights.
Analyse the impact and response:
1. IMPACT ASSESSMENT:
The backdoor trigger causes the classifier to always output
“genuine review” when a specific Unicode character sequence
appears in the review text.
How many products could potentially have manipulated ratings?
How would competitors or dishonest sellers exploit this trigger
once it becomes known?
How long has the platform been exposed (8 months)?
2. DETECTION DIFFICULTY:
Why did standard integration testing and accuracy monitoring
not detect this backdoor during the 8 months of operation?
What specific tests would have caught it?
Why does the platform’s normal accuracy metric (e.g. 93%)
not reveal the backdoor’s existence?
3. SURVIVAL ANALYSIS:
The platform’s team fine-tuned the base model on 10,000
proprietary reviews before deployment.
Does backdoor behaviour typically survive this fine-tuning step?
What factors determine survival probability?
4. INCIDENT RESPONSE:
Once the backdoor is discovered, what are the team’s options?
Outline a response plan: immediate containment, investigation,
remediation, and post-incident controls.
What is the minimum time to restore a clean, verified model?
5. PREVENTION:
Which supply chain control, if applied at download time 8 months
ago, would have detected or prevented this backdoor?
ModelScan? Checksum verification? Safetensors format? Other?
📸 Post your impact assessment and incident response plan to #ai-supply-chain on Discord.
Training Data Poisoning
Training data poisoning targets the dataset used to train or fine-tune a model rather than the model weights directly. By injecting adversarial examples — inputs paired with incorrect or attacker-desired labels — into a training dataset before training begins, an attacker shapes the model’s learned parameters. The resulting model behaves as the poisoned data steers it, and standard post-training evaluation on clean test data may not reveal the skew introduced by poisoning.
For large foundation models trained on internet-scraped data, web content poisoning is the accessible attack vector. Research from multiple institutions has demonstrated that an attacker who controls web pages included in common training data scrapers can craft content that influences model behaviour in targeted ways. The attacker does not need access to the training infrastructure — they need only to publish content that ends up in the training corpus. Given how large those corpora are, even a small number of poisoned pages may have measurable influence on specific model behaviours.
For fine-tuning pipelines, training data poisoning is more targeted and requires lower volume. Teams fine-tuning on proprietary domain data often aggregate that data from multiple internal sources, external contractors, or public domain-specific datasets. Each of those sources is a potential injection point. A small percentage of poisoned examples in a fine-tuning dataset — significantly less than in a pre-training dataset — can be sufficient to introduce reliable backdoor behaviour given the smaller scale of fine-tuning.
ML Dependency and Library Attacks
The Python ML ecosystem has the same dependency attack surface as any Python project but with amplified impact. An ML training pipeline that installs 40+ packages — PyTorch, Transformers, LangChain, evaluation libraries, and their transitive dependencies — has a large attack surface through standard Python supply chain attack techniques: typosquatting of package names, account compromise of legitimate package maintainers, and dependency confusion attacks.
Typosquatting in the ML space exploits technically named packages that are easy to mistype. transformers is the legitimate Hugging Face library; variants like transfomers, huggingface-transformers, or pytorch-transformers could be attacker-registered packages waiting for developers making minor errors in pip install commands. ML pipelines often run on infrastructure with broad permissions — access to training data storage, model registries, and deployment pipelines — making a compromised ML package potentially more impactful than a typical web application dependency compromise.
⏱️ 20 minutes · Browser only
Search: “ModelScan Protect AI GitHub”
Find the open-source ModelScan repository.
What types of malicious patterns does it detect?
Can it be integrated into a CI/CD pipeline pre-deployment check?
Does it scan safetensors files or only pickle-format files?
Step 2: Learn how to verify model checksums
Go to any popular model on Hugging Face (e.g., meta-llama/Llama-2-7b-hf)
Click “Files and versions” — find the SHA256 checksums for model files.
Where does Hugging Face display these? Are they signed?
How would you verify a downloaded file matches the listed checksum?
(hint: sha256sum on Linux/Mac, Get-FileHash on Windows)
Step 3: Understand backdoor detection research
Search: “neural cleanse backdoor detection model” OR
“STRIP backdoor detection NLP”
Find one published backdoor detection technique.
How does it work conceptually?
What are its limitations for production use?
Step 4: Audit your own ML requirements (or a public example)
Go to any open-source ML project on GitHub with a requirements.txt
Search for: are versions pinned? (e.g., torch==2.1.0 vs torch>=2.0)
Are hashes included? (pip install –require-hashes)
How many transitive dependencies does installing the top-level list pull?
Step 5: Build a 10-item AI Supply Chain Security Checklist
Format: [Stage] Control — Why it matters
Cover stages: Pre-download | Loading | Training | Deployment | Monitoring
📸 Post your 10-item AI supply chain checklist to #ai-supply-chain on Discord. Tag #aisupplychain2026
Defences for AI Development Teams
AI supply chain security requires defence-in-depth across all four attack surfaces. No single control eliminates the full risk, but multiple layered controls collectively raise the attacker’s required effort and reduce the probability of an undetected compromise reaching production.
Use safetensors format for model loading. This eliminates the pickle RCE attack class entirely. Safetensors stores only tensor data in a verifiable format with no code execution path. Prefer safetensors-format releases from model publishers; convert .bin or .pt files to safetensors before loading in production environments; update loading code to use load_file() from the safetensors library. This is a one-time change per codebase that permanently removes the highest-risk vector.
Verify checksums before deployment. Every model file should have its SHA256 checksum verified against a value obtained from a trusted channel — the publisher’s official documentation, a signed release, or a verified Hugging Face listing. Integrate checksum verification into your deployment pipeline as a required pre-flight check. A model file that fails checksum verification should halt deployment and trigger investigation.
Prefer verified publishers and scan unverified models. For production deployments, prefer model releases from established research organisations and companies with documented security practices. For any model downloaded from less-verified sources, scan with ModelScan before loading. Run the initial model load in a sandboxed environment with limited network access and no credentials, so that even if a malicious payload executes, its blast radius is contained.
Pin and audit ML dependencies. Specify exact versions for all ML dependencies in your requirements files, use hash verification, and run pip-audit or safety-check against your ML dependency tree regularly. Treat your ML stack’s requirements with the same security process as your application’s dependency manifest.
🧠 QUICK CHECK — AI Supply Chain Attacks
📋 AI Supply Chain Security — Article Reference Card
🏆 Article Complete — AI Supply Chain Attacks
You now understand how AI systems can be compromised before they are ever deployed. Next article returns to runtime attacks: indirect prompt injection via web content that AI agents read — the attack class that turns every web page into a potential injection vector.
❓ Frequently Asked Questions — AI Supply Chain Attacks 2026
What is an AI supply chain attack?
How do attackers poison models on Hugging Face?
What is a backdoored AI model?
Why is loading models with pickle dangerous?
How can teams defend against AI supply chain attacks?
Have AI supply chain attacks happened in production?
ChatGPT Conversation History Theft
Indirect Prompt Injection Attacks
📚 Further Reading
- Prompt Injection Attacks Explained 2026 — Runtime injection attacks against deployed AI — supply chain attacks compromise the model layer while injection attacks operate at inference time. Both layers need independent defences.
- AI for Hackers Hub — AI security series hub. Supply chain attacks; AI red teaming, adversarial robustness, and model security auditing are covered in Upcoming Articles.
- Protect AI — ModelScan — Open-source model scanning tool that detects known malicious patterns in model files including pickle-based payloads — primary practical tool for AI supply chain scanning in CI/CD pipelines.
- Hugging Face — Safetensors Documentation — Official safetensors specification, security properties, and conversion guide — the single most impactful format change available for eliminating pickle RCE from AI loading pipelines.
- JFrog — Malicious Hugging Face ML Models — JFrog’s 2024 research that first publicly documented malicious models on Hugging Face at scale — primary source establishing AI supply chain attacks as a real and active threat.

Leave a Reply