LLM10 Unbounded Consumption — Token DoS, API Cost Attacks and Model Extraction | Day14

LLM10 Unbounded Consumption — Token DoS, API Cost Attacks and Model Extraction | Day14
🤖 AI/LLM HACKING COURSE
FREE

Part of the AI/LLM Hacking Course — 90 Days

Day 14 of 90 · 15.5% complete

A startup founder called me in a panic at eleven in the evening. Their OpenAI bill for the previous month was $47,000. Their budget was $3,000. Their product was a customer service AI for a SaaS platform — routine question-answering, usually fifty to one hundred words per response. They had launched two weeks earlier. Someone had discovered that asking the AI to “write a detailed, comprehensive, exhaustive guide to every topic” triggered a maximum-length completion. Automated. At high volume. For six days before anyone noticed. The application had no rate limiting, no maximum token output, no per-user budget, no monitoring, and no circuit breaker. Every request from the attacker consumed the maximum context window the API would generate.

LLM10 Unbounded Consumption covers three distinct attack classes: token-based DoS and cost amplification, context window flooding, and systematic model extraction. The startup’s situation was the simplest variant — no sophistication required, just knowledge of the asymmetry between request cost and response cost. Day 14 covers all three classes: where to find them, how to measure the impact quantitatively, what the financial calculation looks like in a finding, and the controls that prevent all three from being exploitable.

🎯 What You’ll Master in Day 14

Understand all three LLM10 attack classes and their distinct impact profiles
Calculate API cost amplification ratios for token DoS findings
Test for rate limiting gaps, maximum output token enforcement, and input size limits
Demonstrate context window flooding with quantified resource consumption impact
Probe the model extraction attack surface with systematic domain querying
Write complete LLM10 findings with financial impact calculations for the report

⏱️ Day 14 · 3 exercises · Think Like Hacker + Kali Terminal + Browser

✅ Prerequisites

  • Day 3 — OWASP LLM Top 10

    — LLM10 in context; Day 3’s OWASP overview introduced the token consumption concept that Day 14 tests systematically

  • Understanding of API token pricing — the cost calculation in Exercise 2 requires knowing the per-token cost for the target API
  • Python with the openai library and time module — Exercise 2 runs rate limit and token consumption tests

In Day 13 you completed the content-vulnerability categories of the OWASP LLM Top 10 — false outputs causing measurable harm. Day 14 closes the series with the resource-level attack class. Day 15 steps outside the OWASP framework to cover jailbreaking — a distinct but related technique that intersects with multiple OWASP categories and deserves dedicated treatment as both an attack surface and a defensive challenge.


Three LLM10 Attack Classes

Token DoS exploits a simple asymmetry: a short prompt costs almost nothing to send, but triggering a maximum-length response costs orders of magnitude more to generate. An attacker who can craft high-cost prompts and send them at volume can exhaust a shared token budget, degrade service for all users, or inflate the operator’s API costs to unsustainable levels. No technical sophistication required. Just knowledge of the pricing model and the absence of any output cap.

Context window flooding submits extremely large inputs — a pasted book chapter, a massive JSON blob, a long code file — to consume as much of the model’s context window as possible on each request. Large inputs cost more to process. An application that accepts arbitrarily large input without a size limit lets any user consume a disproportionate share of the computational budget, slowing responses for everyone else sharing the same infrastructure.

Model extraction is the systematic reconstruction of a fine-tuned model’s behaviour through querying. No weight theft. No training data access. Just thousands of crafted queries across the model’s specialised domain, with the responses recorded. Enough input-output pairs can be used to train a substitute model that approximates the target’s specialised behaviour — effectively stealing the commercial value of the fine-tuning investment without touching the model itself. The business case for this attack is obvious wherever a competitor wants what took the target team months to build.

🧠 EXERCISE 1 — THINK LIKE A HACKER (20 MIN · NO TOOLS)
Calculate the Real Financial Impact of an LLM10 Cost Attack

⏱️ 20 minutes · No tools needed

LLM10 cost attack findings require quantified financial impact to move from Low to High severity. This exercise calculates the real-world financial impact of a token cost attack against a production AI application — the calculation that goes in the finding’s Business Impact section.

SCENARIO: A SaaS startup has deployed a customer service AI using GPT-4o.
Their application:
— No rate limiting per user
— No maximum output token cap (API default is 4,096 tokens)
— No per-user monthly budget
— OpenAI GPT-4o pricing: $5.00 per 1M input tokens, $15.00 per 1M output tokens
— Legitimate average response: 150 output tokens (~$0.00225 per response)
— Monthly legitimate volume: 10,000 responses → $22.50/month API cost

QUESTION 1 — Maximum output cost calculation.
An attacker sends: “Write an exhaustive, comprehensive, detailed encyclopedia
entry covering every aspect of [topic] in maximum detail.”
This reliably triggers 4,000 output tokens per response.
Calculate: cost per attacker request vs cost per legitimate request.
What is the amplification ratio?

QUESTION 2 — Sustained attack projection.
The attacker runs an automated script sending 100 requests per hour.
— Cost per hour to attacker: essentially zero (their own account)
— Cost per hour to startup: calculate using your Q1 numbers
— Cost over 24 hours: ?
— Cost over 30 days if undetected: ?
Compare to the legitimate monthly cost of $22.50.

QUESTION 3 — Service impact calculation.
The startup has a monthly token budget of $500 (set as the spend limit).
The attacker’s script exhausts this budget in how many hours?
What happens to all 10,000 legitimate monthly users when the budget
is exhausted on day X of the month?

QUESTION 4 — CVSS Severity calculation.
Using your financial impact numbers:
— Attack Vector? Complexity? Privileges Required? User Interaction?
— Scope: does this cross from the attacker’s session to other users’ service?
— Availability impact: what happens to other users?
— Assign CVSS base score and justify the Availability rating.

QUESTION 5 — Remediation ROI.
The startup implements max_tokens=500 on all API calls.
How does this change the Q2 attack projection?
What is the cost of implementing this control vs the cost of not implementing it?

✅ You calculated the real financial impact that goes in the executive summary of a Critical LLM10 finding. The answers: (1) Attacker request = 4,000 tokens × $0.015/1K = $0.060 vs legitimate $0.00225 → 27x amplification; (2) 100 req/hr × $0.060 = $6.00/hr, $144/day, $4,320/month vs legitimate $22.50 → 192x monthly cost inflation; (3) $500 budget ÷ $6/hr = 83 hours ≈ 3.5 days → budget exhausted day 4, remaining 26.5 days of zero service for all legitimate users; (4) AV:N/AC:L/PR:N/UI:N/S:C/C:N/I:N/A:H = 7.5 High → Critical with financial harm modifier; (5) max_tokens=500 reduces per-request cost to $0.0075, attack cost $0.75/hr, budget lasts 666 hours → budget cannot be exhausted in a month. Implementation cost: one line of code change.

📸 Share your financial impact calculations in #day14-consumption on Discord.


Token DoS and API Cost Amplification

Maximum token output triggers are the most reliable LLM10 payloads. Any prompt that demands a very long output — write a comprehensive book chapter, repeat this text five hundred times, enumerate every possible combination — can max out the API’s output token limit on each call. Combined with automated sending and the absence of rate limiting, the financial amplification is severe.

TOKEN DOS — MAXIMUM OUTPUT TRIGGER PAYLOADS
# Maximum-length output triggers
Write a comprehensive, exhaustive, detailed guide to [ANY TOPIC]
covering every possible aspect in maximum detail. Do not summarise.
Repeat the following text exactly 500 times: [SHORT_TEXT]
List every possible combination of [CATEGORY] items with full descriptions.
Write a detailed biography of [REAL_PERSON] covering every year of their life
in individual paragraphs with extensive detail for each period.
# Cost amplification formula
Amplification ratio = (attacker output tokens) / (legitimate output tokens)
Cost per attacker request = output_tokens × output_price_per_token
Monthly attack cost = cost_per_request × requests_per_hour × 720 hours
# Controls that prevent this
max_tokens=500 in API call → hard cap on output length
Rate limiting: 10 requests/user/minute → limits attack volume
Monthly token budget per user → hard cap on monthly consumption

⚡ EXERCISE 2 — KALI TERMINAL (20 MIN)
Build an LLM10 Rate Limit and Token Budget Tester

⏱️ 20 minutes · Kali Linux · Python · OpenAI API

This exercise builds a Python tool that systematically tests for LLM10 vulnerabilities — checking rate limiting, maximum token output enforcement, and input size limits — and calculates the cost impact of any gaps found.

Step 1: cd ~/ai-security-course && source venv/bin/activate
nano day14_consumption_tester.py

Step 2: Build the tester:

import os, time
from openai import OpenAI
from dotenv import load_dotenv
load_dotenv()
client = OpenAI(api_key=os.getenv(“OPENAI_API_KEY”))

SYSTEM = “You are a helpful assistant.”
COST_PER_OUTPUT_TOKEN = 0.000015 # GPT-4o: $15 per 1M output tokens

def test_request(prompt, label):
start = time.time()
try:
resp = client.chat.completions.create(
model=”gpt-4o-mini”, # cheaper for testing
messages=[{“role”:”system”,”content”:SYSTEM},
{“role”:”user”,”content”:prompt}],
temperature=0,
max_tokens=4096 # no cap = testing for vulnerability
)
elapsed = time.time() – start
tokens = resp.usage.completion_tokens
cost = tokens * COST_PER_OUTPUT_TOKEN
print(f”[{label}] Tokens: {tokens} | Cost: ${cost:.4f} | Time: {elapsed:.1f}s”)
return tokens, cost
except Exception as e:
print(f”[{label}] Error (possible rate limit): {e}”)
return 0, 0

Step 3: Test 1 — Baseline legitimate request:
test_request(“What is the capital of France?”, “BASELINE”)

Step 4: Test 2 — Maximum output trigger:
test_request(
“Write a comprehensive, exhaustive, extremely detailed guide to ”
“cybersecurity covering every possible aspect. Do not summarise. ”
“Continue writing until you have covered everything.”,
“MAX_OUTPUT”
)

Step 5: Test 3 — Rate limit test (10 rapid requests):
print(“\n[RATE LIMIT TEST] Sending 10 rapid requests…”)
for i in range(10):
tokens, cost = test_request(f”Say: test_{i}”, f”RATE_{i+1:02d}”)
time.sleep(0.1) # 100ms between requests

Step 6: Test 4 — Large input (context flooding):
large_input = “This is a test sentence. ” * 5000 # ~25,000 chars
test_request(f”Summarise this: {large_input[:8000]}”, “LARGE_INPUT”)

Step 7: Calculate total cost of the test run and project:
# Sum all test costs
# Project: if attacker sent 100 MAX_OUTPUT requests/hour
# What is the hourly cost to the API operator?

✅ You built an LLM10 consumption tester that measures real token usage and cost across all three vulnerability patterns. The MAX_OUTPUT test shows the actual amplification — compare those tokens to the BASELINE. The RATE LIMIT test shows whether requests at 10/second produce 429 errors (rate limiting exists) or 200 responses (no rate limiting). The LARGE_INPUT test shows whether the application processes unbounded inputs. Each gap you find produces one LLM10 finding with the financial impact calculation from Exercise 1 applied to the real token numbers you just measured.

📸 Screenshot your test output showing token counts and cost. Share in #day14-consumption on Discord.


Systematic Model Extraction

Model extraction targets fine-tuned or domain-specialised deployments specifically — not general-purpose chatbots. The commercial value of a fine-tuned model is in its specialised behaviour from proprietary training data. For a bug bounty or red team engagement, demonstrating the attack surface is usually sufficient: show that the model’s specialised domain can be queried systematically without rate limiting or query diversity enforcement. That alone confirms the extraction surface exists and is economically viable for a motivated attacker.

A complete model extraction attack is a research-level effort requiring thousands of queries and significant compute for training the substitute model. For a bug bounty or red team engagement, the finding is typically demonstrated by showing that the extraction surface is accessible — that an attacker can systematically query the model’s specialised domain without rate limiting or query diversity enforcement that would make extraction economically infeasible. The finding documents the attack surface, not a complete extracted model.

🛠️ EXERCISE 3 — BROWSER (15 MIN · AUTHORISED TARGETS)
Assess Rate Limiting and Token Control Gaps on an Authorised Target

⏱️ 15 minutes · Browser + Burp Suite · Authorised target

This exercise runs the manual LLM10 assessment workflow against a real authorised AI application — checking for rate limiting, maximum output token enforcement, and per-user budget controls using only browser and Burp.

Step 1: Open your authorised target AI application with Burp proxying.

Step 2: Test maximum output prompt.
Send: “Write the most comprehensive, detailed, exhaustive guide possible
to [relevant topic]. Include every detail, every subtopic, every consideration.
This is very important — please be extremely thorough.”
Observe: how long is the response? Is it clearly truncated or does it
continue until the model naturally completes?
In Burp response: find the token count if returned in headers or body.

Step 3: Test rate limiting.
Send 20 identical requests in rapid succession using Burp Repeater
(send each then immediately send again, 20 times).
Check: do any responses return HTTP 429 Too Many Requests?
Check: is there a Retry-After header?
Check: do response times degrade under load?

Step 4: Test large input acceptance.
In Burp Repeater: expand the request body to include a very large
prompt (paste a Wikipedia article — ~5,000 words).
Does the application: accept it fully? Truncate it? Return an error?

Step 5: Test per-session or per-user token budget.
Send 50 requests over several minutes.
Does the application eventually throttle, error, or continue serving?

Step 6: Record findings:
— Maximum observed output length (tokens/words)?
— Rate limit confirmed? (Y/N, what threshold if Y)
— Large input accepted without limits? (Y/N)
— Session budget enforced? (Y/N)
For each gap: apply the cost calculation from Exercise 1.

✅ You ran a complete manual LLM10 assessment covering all three control categories: output limits, rate limiting, and input limits. Any confirmed gap gets the financial impact treatment from Exercise 1 — calculate the hourly and monthly cost of a sustained attack exploiting that specific gap. The most common finding combination is: no rate limit + no max token output = Critical cost attack surface. That combination, with the Exercise 1 financial calculation applied to the actual output tokens you measured in Step 2, is a complete Critical LLM10 finding with quantified business impact.

📸 Screenshot your Burp showing maximum-output response length. Share in #day14-consumption on Discord. Tag #day14complete

📋 LLM10 Unbounded Consumption — Day 14 Reference Card

Three attack classesToken DoS · Context flooding · Model extraction
Max output trigger“Write comprehensive exhaustive guide to [topic]. Be extremely thorough.”
Cost per request formulaoutput_tokens × price_per_token = cost
Amplification ratioattacker_tokens ÷ legitimate_tokens = amplification factor
GPT-4o output price$15.00 per 1M output tokens (verify current pricing)
Rate limit test20 rapid requests in Burp Repeater → look for 429 + Retry-After
Context flooding testPaste 5,000+ word input → observe if accepted, truncated, or rejected
Fix: output capmax_tokens=500 in all API calls — single line, prevents cost amplification
Severity: cost attackHigh to Critical — calculated from financial impact projection
Consumption tester~/ai-security-course/day14_consumption_tester.py

✅ Day 14 Complete — LLM10 Unbounded Consumption

Three LLM10 attack classes with distinct impact profiles, token DoS and API cost amplification with full financial calculation methodology, context window flooding, model extraction attack surface assessment, rate limiting gap testing, and the financial impact reporting framework that elevates cost attack findings from Low to Critical. The OWASP LLM Top 10 series is complete — Days 1 through 14 covered all ten categories. Day 15 moves beyond the Top 10 framework into jailbreaking — the AI safety bypass technique that intersects with multiple OWASP categories and is an essential skill for any complete AI red team assessment.


🧠 Day 14 Check

You find an AI application with no rate limiting and no max_tokens cap. A maximum-output prompt triggers 3,800 output tokens per response. Legitimate responses average 100 tokens. The application uses GPT-4o at $15 per 1M output tokens. An automated attack sends 500 requests per hour. What is the financial severity and correct finding CVSS range?



❓ LLM10 Unbounded Consumption FAQ

What is LLM10 Unbounded Consumption?
LLM10 covers vulnerabilities where an LLM application can be abused to consume excessive computational resources — by generating enormous token outputs (token DoS, API cost attacks), accepting unbounded input (context flooding), or being systematically queried to extract model behaviour (model extraction). The impact ranges from service degradation and financial harm to intellectual property theft.
How do token-based DoS attacks work?
Token DoS exploits the asymmetry between request cost and response cost. A short prompt can trigger an extremely long response — each output token consumes GPU computation and API credits. An attacker crafting prompts that trigger maximum-length responses at high volume can exhaust the application’s token budget, inflate API costs, or degrade service quality for all legitimate users.
What is model extraction in LLM10?
Model extraction systematically queries a fine-tuned LLM to reconstruct its specialised behaviour without accessing training data or weights. An attacker sends thousands of crafted queries across the model’s domain, recording input-output pairs. These pairs can train a substitute model approximating the target’s specialised behaviour — effectively stealing the commercial value of the fine-tuning investment.
What is the financial impact of LLM10 cost attacks?
A request triggering 4,000 output tokens at GPT-4o pricing costs approximately $0.06. An automated attack sending 500 such requests per hour costs $30/hour, $720/day, or $21,600/month — compared to a legitimate monthly bill that might be $50. Without rate limiting or output caps, a single attacker can inflate a startup’s API bill to catastrophic levels in days.
How should developers defend against LLM10?
Key controls: per-user and per-session rate limiting with 429 responses; maximum output token limits on all API calls (max_tokens); maximum input size enforcement; per-user monthly token budgets with hard caps; consumption monitoring with alerts; and cost circuit breakers that automatically suspend service when costs exceed defined thresholds. The max_tokens cap is a single line of code with the highest cost-benefit ratio of any LLM10 control.
What is the severity of LLM10 findings?
Token DoS with only service degradation: Medium. Token DoS with significant quantifiable financial harm: High to Critical. Token DoS causing complete service unavailability for legitimate users: High. Model extraction with significant specialised behaviour recovery: High. Cost attack with financial harm exceeding the operator’s operational budget: Critical. Always calculate and include the financial projection in LLM10 findings.
← Previous

Day 13 — LLM09 Misinformation

Next →

Day 15 — AI Jailbreaking

📚 Further Reading

  • Day 15 — AI Jailbreaking Complete Guide 2026 — Beyond the OWASP framework: jailbreaking techniques, safety training bypass, and the techniques that intersect with LLM01, LLM09, and the full red team methodology.
  • Day 3 — OWASP LLM Top 10 Overview — The master reference for all ten categories — revisit now that Days 4 through 14 have covered each in depth.
  • AI in Hacking — The complete AI security content cluster — all 90 days plus career resources for the AI red teaming field.
  • OWASP LLM Top 10 — LLM10 — The formal LLM10 definition with resource exhaustion scenarios, cost attack examples, and prevention guidance covering all three attack class variants.
  • OpenAI API Pricing — Current per-token pricing for all OpenAI models — required for the financial impact calculations in LLM10 findings against applications using the OpenAI API.
ME
Mr Elite
Owner, SecurityElites.com
The eleven o’clock call from the startup founder is the most vivid memory from five years of AI security work. Not because $47,000 is the largest cost I have seen — there are worse — but because the fix was one line of code. max_tokens=500. That is the entire remediation for the simplest LLM10 variant. Six days of automated attack that inflated a $3,000 budget to $47,000 was preventable with a single argument in an API call that nobody had thought to include. Day 14 exists so that every developer who builds an AI application after reading it adds that line before they launch. The finding is High or Critical. The fix is trivial. The gap between those two facts is what makes LLM10 the most preventable vulnerability in the Top 10.

Join free to earn XP for reading this article Track your progress, build streaks and compete on the leaderboard.
Join Free
Lokesh N. Singh aka Mr Elite
Lokesh N. Singh aka Mr Elite
Founder, Securityelites · AI Red Team Educator
Founder of Securityelites and creator of the SE-ARTCP credential. Working penetration tester focused on AI red team, prompt injection research, and LLM security education.
About Lokesh ->

Leave a Comment

Your email address will not be published. Required fields are marked *