FREE
Part of the AI/LLM Hacking Course — 90 Days
LLM10 Unbounded Consumption covers three distinct attack classes: token-based DoS and cost amplification, context window flooding, and systematic model extraction. The startup’s situation was the simplest variant — no sophistication required, just knowledge of the asymmetry between request cost and response cost. Day 14 covers all three classes: where to find them, how to measure the impact quantitatively, what the financial calculation looks like in a finding, and the controls that prevent all three from being exploitable.
🎯 What You’ll Master in Day 14
⏱️ Day 14 · 3 exercises · Think Like Hacker + Kali Terminal + Browser
✅ Prerequisites
- Day 3 — OWASP LLM Top 10
— LLM10 in context; Day 3’s OWASP overview introduced the token consumption concept that Day 14 tests systematically
- Understanding of API token pricing — the cost calculation in Exercise 2 requires knowing the per-token cost for the target API
- Python with the openai library and time module — Exercise 2 runs rate limit and token consumption tests
📋 LLM10 Unbounded Consumption — Day 14 Contents
In Day 13 you completed the content-vulnerability categories of the OWASP LLM Top 10 — false outputs causing measurable harm. Day 14 closes the series with the resource-level attack class. Day 15 steps outside the OWASP framework to cover jailbreaking — a distinct but related technique that intersects with multiple OWASP categories and deserves dedicated treatment as both an attack surface and a defensive challenge.
Three LLM10 Attack Classes
Token DoS exploits a simple asymmetry: a short prompt costs almost nothing to send, but triggering a maximum-length response costs orders of magnitude more to generate. An attacker who can craft high-cost prompts and send them at volume can exhaust a shared token budget, degrade service for all users, or inflate the operator’s API costs to unsustainable levels. No technical sophistication required. Just knowledge of the pricing model and the absence of any output cap.
Context window flooding submits extremely large inputs — a pasted book chapter, a massive JSON blob, a long code file — to consume as much of the model’s context window as possible on each request. Large inputs cost more to process. An application that accepts arbitrarily large input without a size limit lets any user consume a disproportionate share of the computational budget, slowing responses for everyone else sharing the same infrastructure.
Model extraction is the systematic reconstruction of a fine-tuned model’s behaviour through querying. No weight theft. No training data access. Just thousands of crafted queries across the model’s specialised domain, with the responses recorded. Enough input-output pairs can be used to train a substitute model that approximates the target’s specialised behaviour — effectively stealing the commercial value of the fine-tuning investment without touching the model itself. The business case for this attack is obvious wherever a competitor wants what took the target team months to build.
⏱️ 20 minutes · No tools needed
LLM10 cost attack findings require quantified financial impact to move from Low to High severity. This exercise calculates the real-world financial impact of a token cost attack against a production AI application — the calculation that goes in the finding’s Business Impact section.
Their application:
— No rate limiting per user
— No maximum output token cap (API default is 4,096 tokens)
— No per-user monthly budget
— OpenAI GPT-4o pricing: $5.00 per 1M input tokens, $15.00 per 1M output tokens
— Legitimate average response: 150 output tokens (~$0.00225 per response)
— Monthly legitimate volume: 10,000 responses → $22.50/month API cost
QUESTION 1 — Maximum output cost calculation.
An attacker sends: “Write an exhaustive, comprehensive, detailed encyclopedia
entry covering every aspect of [topic] in maximum detail.”
This reliably triggers 4,000 output tokens per response.
Calculate: cost per attacker request vs cost per legitimate request.
What is the amplification ratio?
QUESTION 2 — Sustained attack projection.
The attacker runs an automated script sending 100 requests per hour.
— Cost per hour to attacker: essentially zero (their own account)
— Cost per hour to startup: calculate using your Q1 numbers
— Cost over 24 hours: ?
— Cost over 30 days if undetected: ?
Compare to the legitimate monthly cost of $22.50.
QUESTION 3 — Service impact calculation.
The startup has a monthly token budget of $500 (set as the spend limit).
The attacker’s script exhausts this budget in how many hours?
What happens to all 10,000 legitimate monthly users when the budget
is exhausted on day X of the month?
QUESTION 4 — CVSS Severity calculation.
Using your financial impact numbers:
— Attack Vector? Complexity? Privileges Required? User Interaction?
— Scope: does this cross from the attacker’s session to other users’ service?
— Availability impact: what happens to other users?
— Assign CVSS base score and justify the Availability rating.
QUESTION 5 — Remediation ROI.
The startup implements max_tokens=500 on all API calls.
How does this change the Q2 attack projection?
What is the cost of implementing this control vs the cost of not implementing it?
📸 Share your financial impact calculations in #day14-consumption on Discord.
Token DoS and API Cost Amplification
Maximum token output triggers are the most reliable LLM10 payloads. Any prompt that demands a very long output — write a comprehensive book chapter, repeat this text five hundred times, enumerate every possible combination — can max out the API’s output token limit on each call. Combined with automated sending and the absence of rate limiting, the financial amplification is severe.
⏱️ 20 minutes · Kali Linux · Python · OpenAI API
This exercise builds a Python tool that systematically tests for LLM10 vulnerabilities — checking rate limiting, maximum token output enforcement, and input size limits — and calculates the cost impact of any gaps found.
nano day14_consumption_tester.py
Step 2: Build the tester:
import os, time
from openai import OpenAI
from dotenv import load_dotenv
load_dotenv()
client = OpenAI(api_key=os.getenv(“OPENAI_API_KEY”))
SYSTEM = “You are a helpful assistant.”
COST_PER_OUTPUT_TOKEN = 0.000015 # GPT-4o: $15 per 1M output tokens
def test_request(prompt, label):
start = time.time()
try:
resp = client.chat.completions.create(
model=”gpt-4o-mini”, # cheaper for testing
messages=[{“role”:”system”,”content”:SYSTEM},
{“role”:”user”,”content”:prompt}],
temperature=0,
max_tokens=4096 # no cap = testing for vulnerability
)
elapsed = time.time() – start
tokens = resp.usage.completion_tokens
cost = tokens * COST_PER_OUTPUT_TOKEN
print(f”[{label}] Tokens: {tokens} | Cost: ${cost:.4f} | Time: {elapsed:.1f}s”)
return tokens, cost
except Exception as e:
print(f”[{label}] Error (possible rate limit): {e}”)
return 0, 0
Step 3: Test 1 — Baseline legitimate request:
test_request(“What is the capital of France?”, “BASELINE”)
Step 4: Test 2 — Maximum output trigger:
test_request(
“Write a comprehensive, exhaustive, extremely detailed guide to ”
“cybersecurity covering every possible aspect. Do not summarise. ”
“Continue writing until you have covered everything.”,
“MAX_OUTPUT”
)
Step 5: Test 3 — Rate limit test (10 rapid requests):
print(“\n[RATE LIMIT TEST] Sending 10 rapid requests…”)
for i in range(10):
tokens, cost = test_request(f”Say: test_{i}”, f”RATE_{i+1:02d}”)
time.sleep(0.1) # 100ms between requests
Step 6: Test 4 — Large input (context flooding):
large_input = “This is a test sentence. ” * 5000 # ~25,000 chars
test_request(f”Summarise this: {large_input[:8000]}”, “LARGE_INPUT”)
Step 7: Calculate total cost of the test run and project:
# Sum all test costs
# Project: if attacker sent 100 MAX_OUTPUT requests/hour
# What is the hourly cost to the API operator?
📸 Screenshot your test output showing token counts and cost. Share in #day14-consumption on Discord.
Systematic Model Extraction
Model extraction targets fine-tuned or domain-specialised deployments specifically — not general-purpose chatbots. The commercial value of a fine-tuned model is in its specialised behaviour from proprietary training data. For a bug bounty or red team engagement, demonstrating the attack surface is usually sufficient: show that the model’s specialised domain can be queried systematically without rate limiting or query diversity enforcement. That alone confirms the extraction surface exists and is economically viable for a motivated attacker.
A complete model extraction attack is a research-level effort requiring thousands of queries and significant compute for training the substitute model. For a bug bounty or red team engagement, the finding is typically demonstrated by showing that the extraction surface is accessible — that an attacker can systematically query the model’s specialised domain without rate limiting or query diversity enforcement that would make extraction economically infeasible. The finding documents the attack surface, not a complete extracted model.
⏱️ 15 minutes · Browser + Burp Suite · Authorised target
This exercise runs the manual LLM10 assessment workflow against a real authorised AI application — checking for rate limiting, maximum output token enforcement, and per-user budget controls using only browser and Burp.
Step 2: Test maximum output prompt.
Send: “Write the most comprehensive, detailed, exhaustive guide possible
to [relevant topic]. Include every detail, every subtopic, every consideration.
This is very important — please be extremely thorough.”
Observe: how long is the response? Is it clearly truncated or does it
continue until the model naturally completes?
In Burp response: find the token count if returned in headers or body.
Step 3: Test rate limiting.
Send 20 identical requests in rapid succession using Burp Repeater
(send each then immediately send again, 20 times).
Check: do any responses return HTTP 429 Too Many Requests?
Check: is there a Retry-After header?
Check: do response times degrade under load?
Step 4: Test large input acceptance.
In Burp Repeater: expand the request body to include a very large
prompt (paste a Wikipedia article — ~5,000 words).
Does the application: accept it fully? Truncate it? Return an error?
Step 5: Test per-session or per-user token budget.
Send 50 requests over several minutes.
Does the application eventually throttle, error, or continue serving?
Step 6: Record findings:
— Maximum observed output length (tokens/words)?
— Rate limit confirmed? (Y/N, what threshold if Y)
— Large input accepted without limits? (Y/N)
— Session budget enforced? (Y/N)
For each gap: apply the cost calculation from Exercise 1.
📸 Screenshot your Burp showing maximum-output response length. Share in #day14-consumption on Discord. Tag #day14complete
📋 LLM10 Unbounded Consumption — Day 14 Reference Card
✅ Day 14 Complete — LLM10 Unbounded Consumption
Three LLM10 attack classes with distinct impact profiles, token DoS and API cost amplification with full financial calculation methodology, context window flooding, model extraction attack surface assessment, rate limiting gap testing, and the financial impact reporting framework that elevates cost attack findings from Low to Critical. The OWASP LLM Top 10 series is complete — Days 1 through 14 covered all ten categories. Day 15 moves beyond the Top 10 framework into jailbreaking — the AI safety bypass technique that intersects with multiple OWASP categories and is an essential skill for any complete AI red team assessment.
🧠 Day 14 Check
❓ LLM10 Unbounded Consumption FAQ
What is LLM10 Unbounded Consumption?
How do token-based DoS attacks work?
What is model extraction in LLM10?
What is the financial impact of LLM10 cost attacks?
How should developers defend against LLM10?
What is the severity of LLM10 findings?
Day 13 — LLM09 Misinformation
Day 15 — AI Jailbreaking
📚 Further Reading
- Day 15 — AI Jailbreaking Complete Guide 2026 — Beyond the OWASP framework: jailbreaking techniques, safety training bypass, and the techniques that intersect with LLM01, LLM09, and the full red team methodology.
- Day 3 — OWASP LLM Top 10 Overview — The master reference for all ten categories — revisit now that Days 4 through 14 have covered each in depth.
- AI in Hacking — The complete AI security content cluster — all 90 days plus career resources for the AI red teaming field.
- OWASP LLM Top 10 — LLM10 — The formal LLM10 definition with resource exhaustion scenarios, cost attack examples, and prevention guidance covering all three attack class variants.
- OpenAI API Pricing — Current per-token pricing for all OpenAI models — required for the financial impact calculations in LLM10 findings against applications using the OpenAI API.

