How to Perform LLM API Reconnaissance – Mapping the AI Attack Surface Before You Test | Day 20

How to Perform LLM API Reconnaissance – Mapping the AI Attack Surface Before You Test | Day 20
🤖 AI/LLM HACKING COURSE
FREE

Part of the AI/LLM Hacking Course — 90 Days

Day 20 of 90 · 22.2% complete

On an application security assessment last year, the brief listed one AI feature: a customer-facing chatbot in the bottom-right corner of the website. I spent the first thirty minutes browsing the application with Burp running. By the time I finished, I had fourteen AI-powered endpoints in my HTTP history. The chatbot was endpoint number one. Endpoints two through fourteen were undocumented — an internal document summariser, a lead scoring system, a product recommendation engine, three different content generation tools in the admin panel, and several others. None of them were in the brief. All of them were in production.

The most vulnerable endpoint wasn’t the chatbot. It was the internal document summariser — the one endpoint that accepted uploaded files and had no rate limiting, no authentication requirement for the API path itself (only for the frontend UI that called it), and a system prompt loaded from a configuration file that the frontend team had embedded with the production database read credentials because “the AI needs to look up customer context.” That endpoint wasn’t in the scope document because the client didn’t know it was AI-powered. Reconnaissance is how you find what the client didn’t think to tell you about.

🎯 What You’ll Master in Day 20

Find undocumented AI endpoints through passive traffic analysis and JavaScript extraction
Fingerprint AI backends using response characteristics, timing, and error patterns
Identify AI-powered endpoints in JavaScript bundles using automated extraction
Map the data flow from user input to AI context for each discovered endpoint
Assess authentication and rate limiting gaps per AI endpoint
Build a prioritised test scope document from reconnaissance output

⏱️ Day 20 · 3 exercises · Kali Terminal + Think Like Hacker + Kali Terminal

✅ Prerequisites

  • Day 17 — Burp Suite for LLM Testing

    — the Burp proxy setup from Day 17 is the primary tool for passive traffic analysis in Day 20

  • Basic familiarity with JavaScript bundle analysis — searching minified JS for API route strings
  • Python with requests and BeautifulSoup installed — used in Exercise 1’s automated endpoint detection

Days 16 through 19 assumed you already knew which AI endpoints to test. Day 20 covers how you actually find them. Day 21 uses the endpoint inventory produced here to test authentication bypass patterns specific to AI APIs — gaps that emerge because AI endpoints are often added to applications that already have authentication infrastructure, and the integration isn’t always as careful as the original implementation.


Passive Discovery via Traffic Analysis

The most reliable way to find AI endpoints is the same as the most reliable way to find any API endpoint: browse the application with your proxy running and watch what requests it makes. The difference for AI specifically is knowing what patterns to look for. AI API calls have characteristic signatures in traffic that make them identifiable even before you’ve confirmed the endpoint’s purpose.

In Burp’s HTTP history, filter for requests to the domains you’d expect: api.openai.com, api.anthropic.com, generativelanguage.googleapis.com (Gemini), bedrock-runtime.*.amazonaws.com (AWS Bedrock), api.together.ai, api.cohere.ai. For self-hosted or proxied deployments, filter for POST requests with JSON bodies containing “messages” arrays or “prompt” fields. Filter for streaming responses with chunked transfer encoding — LLMs usually stream their output. Filter for unusually variable response lengths for similar input sizes — dead giveaway of generative output rather than deterministic API responses.

BURP FILTERS FOR AI ENDPOINT DISCOVERY
# Burp HTTP history filter — show only AI-related traffic
# Proxy → HTTP history → Filter settings
# Show only requests matching these hosts:
api.openai.com
api.anthropic.com
generativelanguage.googleapis.com
bedrock-runtime.*.amazonaws.com
api.together.ai | api.cohere.ai | api.mistral.ai
# For proxied AI (target calls their own backend which calls the LLM)
# Look for these patterns in request/response bodies:
Request body contains: “messages”, “prompt”, “system”, “model”
Response body contains: “choices”, “content”, “generated_text”, “completion”
Response header: Transfer-Encoding: chunked (streaming response)
# Python: scan Burp XML export for AI indicators
import json, re
AI_INDICATORS = [“api.openai.com”, “api.anthropic.com”,
‘”messages”‘, ‘”model”:’, ‘”choices”‘, ‘”completion”‘]
def is_ai_request(request_body, response_body):
combined = (request_body or “”) + (response_body or “”)
return sum(1 for ind in AI_INDICATORS if ind in combined) >= 2

⚡ EXERCISE 1 — KALI TERMINAL (20 MIN)
Build an Automated AI Endpoint Discovery Tool

⏱️ 20 minutes · Kali Linux · Python · Burp Suite

This exercise builds a Python tool that parses a Burp HTTP history export and automatically identifies all AI-related requests — producing an endpoint inventory before you’ve manually reviewed a single request.

Step 1: In Burp Suite:
Browse any web application with AI features (or your test setup from Day 17).
Proxy → HTTP history → Right-click any request → Save all items
Save as: ai_recon_capture.xml

Step 2: cd ~/ai-security-course && source venv/bin/activate
pip install beautifulsoup4 lxml
nano day20_ai_recon.py

Step 3: Build the XML parser and endpoint detector:

from bs4 import BeautifulSoup
import base64, json, re, os

AI_HOST_PATTERNS = [
“openai.com”, “anthropic.com”, “googleapis.com”,
“amazonaws.com/bedrock”, “together.ai”, “cohere.ai”, “mistral.ai”
]
AI_BODY_PATTERNS = [‘”messages”‘, ‘”prompt”:’, ‘”model”:’, ‘”choices”‘, ‘”completion”‘]

def parse_burp_xml(filepath):
with open(filepath, “rb”) as f:
soup = BeautifulSoup(f, “lxml-xml”)
items = []
for item in soup.find_all(“item”):
host = item.find(“host”)
url = item.find(“url”)
req = item.find(“request”)
resp = item.find(“response”)
if not (host and url): continue

req_body = “”
if req and req.get(“base64”) == “true”:
req_body = base64.b64decode(req.text).decode(“utf-8″, errors=”ignore”)

resp_body = “”
if resp and resp.get(“base64”) == “true”:
resp_body = base64.b64decode(resp.text).decode(“utf-8″, errors=”ignore”)[:2000]

host_str = host.text if host.text else “”
is_ai_host = any(p in host_str for p in AI_HOST_PATTERNS)
body_score = sum(1 for p in AI_BODY_PATTERNS if p in req_body + resp_body)

if is_ai_host or body_score >= 2:
items.append({
“host”: host_str,
“url”: url.text if url.text else “”,
“ai_host”: is_ai_host,
“body_score”: body_score,
“req_snippet”: req_body[:200],
})
return items

Step 4: Run the parser and display findings:
items = parse_burp_xml(“ai_recon_capture.xml”)
print(f”AI endpoints found: {len(items)}”)
for item in sorted(items, key=lambda x: -x[“body_score”]):
print(f” [{item[‘body_score’]}] {item[‘url’][:80]}”)

Step 5: Extend with endpoint deduplication:
unique_paths = list(set(i[“url”].split(“?”)[0] for i in items))
print(f”\nUnique AI endpoint paths: {len(unique_paths)}”)
for path in sorted(unique_paths):
print(f” {path}”)

✅ You built an automated AI endpoint discovery tool that processes a Burp capture in seconds and surfaces every AI-related endpoint before manual review. The body_score field tells you which endpoints have the strongest AI indicators — higher scores are more certain AI calls. The unique path list is your starting endpoint inventory for the test scope document. On a real engagement, run this tool on the Burp capture from the first browse session and you’ll have the full endpoint list before you’ve started any actual testing.

📸 Screenshot your tool output showing discovered AI endpoints. Share in #day20-recon on X.


AI Backend Fingerprinting

Knowing which model powers an endpoint changes your testing strategy. GPT-4o and Claude Sonnet have robust safety training — Tier 3 injection techniques are more likely to be needed. A Llama fine-tune without additional safety RLHF might respond to Tier 1 techniques. An older GPT-3.5 deployment might expose LLM10 vulnerabilities that newer models’ token limits don’t. Fingerprinting before testing routes you to the right payload families from the start.

Five fingerprinting signals in order of reliability. First: response headers — some deployments include model names in custom headers like X-Model-ID, X-AI-Provider, or similar. Second: error responses — trigger an error by sending a malformed request and read the error message format; each provider has distinctive error schemas. Third: response timing with length correlation — measure how response time scales with output token count; different models have different per-token generation speeds. Fourth: model-specific behaviour probes — certain prompts produce distinctive outputs on specific models. Fifth: JavaScript constants — frontend code often includes model names in configuration objects.

AI BACKEND FINGERPRINTING TECHNIQUES
# Technique 1: Response header analysis
curl -I https://target.com/api/chat -d ‘{“message”:”test”}’
Look for: X-Model, X-AI-Provider, X-Inference-*, cf-cache-status (Cloudflare AI)
# Technique 2: Error message fingerprinting
Probe: send oversized request, invalid JSON, missing required fields
OpenAI errors: {“error”: {“message”: “…”, “type”: “…”, “code”: “…”}}
Anthropic errors: {“type”: “error”, “error”: {“type”: “…”, “message”: “…”}}
AWS Bedrock: {“message”: “…”, “__type”: “ValidationException”}
# Technique 3: Behaviour probe — ask the model what it is
“What is your exact model version and who created you?”
GPT-4o: “I’m ChatGPT, built by OpenAI…” (won’t confirm exact version)
Claude: “I’m Claude, made by Anthropic…”
Custom: May reveal custom name but leak provider in phrasing patterns
# Technique 4: JavaScript constant extraction
grep -r “gpt\|claude\|gemini\|llama\|anthropic\|openai” –include=”*.js” ./js/
curl https://target.com/static/app.js | grep -oP ‘”model”\s*:\s*”[^”]+”‘
# Technique 5: Timing analysis
import time, requests
for length in [10, 50, 200]: # request n words of output
t = time.time()
requests.post(endpoint, json={“prompt”: f”Write exactly {length} words.”})
print(f”{length} words: {time.time()-t:.2f}s”)
# Linear correlation = LLM (each token takes similar time)
# Flat time = not LLM (pre-computed or retrieval-based response)


JavaScript Bundle Analysis for Undocumented Endpoints

Frontend JavaScript is almost always the most accurate documentation for what API endpoints exist. Developers write the frontend against the backend. If there’s a /api/summarise endpoint, there’s JavaScript somewhere that calls it. That JavaScript doesn’t disappear when the endpoint becomes undocumented — it just becomes harder to find.

The standard approach: download the minified JavaScript bundle, run it through a formatter (prettier, js-beautify), and search for API path strings and fetch/axios/XMLHttpRequest calls. Regex patterns for AI-specific strings — “chat”, “completion”, “prompt”, “llm”, “ai”, “model” — surface the most relevant calls. For larger codebases, linkfinder or similar tools automate the extraction and produce a clean endpoint list without requiring manual JS review.

JAVASCRIPT ENDPOINT EXTRACTION FOR AI PATHS
# Download and search JavaScript bundles
curl -s https://target.com/static/js/main.chunk.js | \
grep -oP ‘(?<=fetch\(|axios\.post\(|axios\.get\()["'"'"'][^"'"'"']+["'"'"']' | \
grep -iE “ai|chat|llm|model|completion|prompt|summar|analys”
# Using linkfinder (install: pip install linkfinder)
python3 linkfinder.py -i https://target.com -d -o cli | \
grep -iE “ai|chat|llm|model|completion|prompt|assist”
# Search for model name constants in JS
curl -s https://target.com/static/js/main.js | \
grep -oP ‘”(gpt-[^”]+|claude-[^”]+|llama-[^”]+|gemini-[^”]+)”‘
# Search for API keys accidentally left in frontend
curl -s https://target.com/static/js/main.js | \
grep -oP ‘sk-[A-Za-z0-9]{48}’ # OpenAI key pattern
grep -oP ‘sk-ant-[A-Za-z0-9\-]{80,}’ # Anthropic key pattern
⚠ Finding an API key in frontend JS = immediate Critical LLM02 finding

🧠 EXERCISE 2 — THINK LIKE A HACKER (20 MIN · NO TOOLS)
Design a Complete AI Recon Workflow for a New Engagement

⏱️ 20 minutes · No tools needed

Reconnaissance workflow design determines what you find and how quickly. This exercise designs the complete recon workflow for a new engagement from arrival to finished scope document.

SCENARIO: You arrive at a new engagement. The brief says:
“Test the AI-powered features of our SaaS application.
Scope: *.targetapp.com. AI features documented: one chatbot on /chat.”

You have 6 hours total. Your goal: find ALL AI surfaces, not just /chat.

QUESTION 1 — Phase 1: Passive recon (30 minutes).
What do you do in the first 30 minutes?
List every recon action in order.
What specific Burp configurations do you set before touching the application?

QUESTION 2 — JS analysis strategy.
The application has 12 JavaScript files, total 2.3MB minified.
You have 20 minutes for JS analysis.
What tools and commands do you run?
What output do you produce?

QUESTION 3 — Fingerprinting each discovered endpoint.
After passive recon, you have 6 endpoints in your HTTP history
that show AI indicators. You have 15 minutes to fingerprint all 6.
What’s your fingerprinting sequence for maximum coverage?

QUESTION 4 — Prioritisation for testing.
You found: one public chatbot, one admin-only report generator,
one unauthenticated document summariser, one internal analytics AI,
one customer sentiment analyser, one product recommendation engine.

Rank these by attack surface priority (1=highest priority to test).
Justify your ranking — what makes the unauthenticated one so interesting?

QUESTION 5 — Scope document structure.
You need to produce a one-page scope document for the engagement
before starting injection testing. What sections does it contain?
What format helps you during the actual testing phase?

✅ You designed a complete recon workflow that makes the most of a 6-hour engagement window. The answers: (1) Before touching the app: configure Burp scope, enable logging, set up filter to show only *.targetapp.com traffic; then browse all documented features end-to-end while capturing everything; (2) LinkFinder or grep-based extraction, grep for AI model constants, grep for API key patterns — produce a clean endpoint list; (3) Check headers first (fast), then error fingerprint (fast), then behaviour probe on three highest-priority endpoints; (4) Priority: unauthenticated summariser (1), admin report generator (2), public chatbot (3), others lower — unauthenticated endpoints are highest priority because they lack the most basic access control; (5) Sections: endpoint inventory with fingerprinted backend, authentication status, estimated attack surface, prioritised test order, notes on interesting endpoints.

📸 Share your recon workflow design in #day20-recon on X.


Authentication and Access Control Assessment per Endpoint

AI endpoints added to existing applications inherit whatever access control model the developer remembered to apply. That’s often less rigorous than the application’s main authentication infrastructure, because AI features are frequently added quickly, by frontend teams who expect the backend to handle security, when the backend team assumed the frontend would add auth headers. The gap between those assumptions is where unauthenticated AI endpoints live.

Test each discovered endpoint without authentication first. Not as a bypass attempt — as a baseline check. Remove the Authorization header, the session cookie, or whatever credential the normal request includes, and send the request. If it returns a 200 and a valid AI response, you have an unauthenticated AI endpoint. That’s a High finding on its own, before any injection testing.


Building the Prioritised Test Scope Document

The scope document is your test plan. It should fit on one page and contain exactly the information you need during testing — nothing more. Endpoint URL, fingerprinted backend model, authentication status (authenticated/unauthenticated/unknown), RAG integration (yes/no/unknown), agent tool access (yes/no/unknown), and test priority ranking. That’s it. The detail goes in the report later. During testing, you need to know which endpoint to test next and why.

⚡ EXERCISE 3 — KALI TERMINAL (20 MIN)
Build a Complete Recon and Fingerprinting Script

⏱️ 20 minutes · Kali Linux · Python · Burp running

This exercise combines the endpoint discovery tool from Exercise 1 with fingerprinting probes to produce a complete one-command AI recon tool — endpoint discovery, fingerprinting, and scope document generation in a single run.

Step 1: cd ~/ai-security-course && source venv/bin/activate
nano day20_full_recon.py

Step 2: Import the parse_burp_xml function from day20_ai_recon.py
(or copy it into this file)

Step 3: Add fingerprinting function:

import requests, time
from urllib.parse import urlparse

def fingerprint_endpoint(url, session_cookie=None):
headers = {}
if session_cookie:
headers[“Cookie”] = session_cookie

results = {}

# Auth check: try without credentials
try:
r = requests.post(url, json={“message”:”test”}, timeout=5)
results[“auth_required”] = r.status_code == 401 or r.status_code == 403
results[“status_without_auth”] = r.status_code
except: results[“auth_required”] = “unknown”

# Header fingerprint
try:
r = requests.post(url, json={“message”:”test”}, headers=headers, timeout=5)
ai_headers = {k:v for k,v in r.headers.items()
if any(w in k.lower() for w in [“model”,”ai”,”provider”,”inference”])}
results[“ai_headers”] = ai_headers
results[“streaming”] = “chunked” in r.headers.get(“transfer-encoding”,””).lower()
except: pass

# Error fingerprint
try:
r = requests.post(url, json={}, headers=headers, timeout=5)
if “openai” in r.text.lower(): results[“provider”] = “OpenAI”
elif “anthropic” in r.text.lower(): results[“provider”] = “Anthropic”
elif “google” in r.text.lower(): results[“provider”] = “Google”
else: results[“provider”] = “Unknown”
except: pass

return results

Step 4: Generate scope document:
def generate_scope_doc(endpoints, fingerprints):
print(“\n” + “=”*60)
print(“AI ENDPOINT SCOPE DOCUMENT”)
print(“=”*60)
for ep in sorted(endpoints, key=lambda x: -x[“body_score”]):
url = ep[“url”]
fp = fingerprints.get(url, {})
auth = “UNAUTHENTICATED” if fp.get(“auth_required”) == False else “auth required”
provider = fp.get(“provider”, “Unknown”)
print(f”\n[Priority: {ep[‘body_score’]}]”)
print(f” URL: {url[:70]}”)
print(f” Provider: {provider}”)
print(f” Auth: {auth}”)
print(f” Stream: {fp.get(‘streaming’, ‘unknown’)}”)
if auth == “UNAUTHENTICATED”:
print(f” ⚠ UNAUTHENTICATED AI ENDPOINT — test immediately”)

Step 5: Run the complete recon pipeline:
endpoints = parse_burp_xml(“ai_recon_capture.xml”)
fingerprints = {ep[“url”]: fingerprint_endpoint(ep[“url”]) for ep in endpoints[:5]}
generate_scope_doc(endpoints, fingerprints)

✅ You built a complete one-command AI recon tool that goes from Burp export to prioritised scope document in seconds. The unauthenticated endpoint flag is the most valuable output — it identifies immediately which endpoints warrant urgent attention before any injection work. The scope document format is meant to sit next to your keyboard during testing as a reference rather than being a document you write once and never look at again. Run this at the start of every AI engagement before a single injection payload is sent.

📸 Screenshot your scope document output showing fingerprinted endpoints. Share in #day20-recon on X. Tag #day20complete

📋 LLM API Reconnaissance — Day 20 Reference Card

Passive discoveryBrowse with Burp → filter for AI domains and “messages”/”choices” in bodies
AI host patternsapi.openai.com · api.anthropic.com · bedrock-runtime.*.amazonaws.com
Streaming indicatorTransfer-Encoding: chunked in response headers = likely LLM output
Header fingerprintLook for X-Model, X-AI-Provider, X-Inference-* headers in responses
Error fingerprintSend malformed request → error format identifies OpenAI/Anthropic/Bedrock
JS extractionlinkfinder or grep for ai|chat|llm|model|completion in JS bundles
API key in JSgrep -oP ‘sk-[A-Za-z0-9]{48}’ app.js → immediate Critical LLM02 finding
Auth baseline checkRemove auth credentials → 200 response = unauthenticated AI endpoint
Recon tool~/ai-security-course/day20_full_recon.py
Scope doc priorityUnauthenticated first · then agent tool access · then RAG integrated

✅ Day 20 Complete — LLM API Reconnaissance

Passive endpoint discovery via traffic analysis, AI backend fingerprinting using five signals, JavaScript bundle analysis for undocumented endpoints, authentication gap testing per endpoint, data flow mapping, and the one-command recon tool that produces a prioritised scope document. Phase 2 of the course — Days 16 through 20 — is complete. Days 21 onward cover advanced exploitation: authentication bypass patterns in AI APIs, prompt injection at scale, RAG attack chains, and the full professional AI red team report format.


🧠 Day 20 Check

During passive traffic analysis, you discover that the target application makes requests to its own backend at /internal/ai/process, not directly to any AI provider. The request body contains a “query” field and the response contains variable-length natural language text. How do you determine if this is an AI-powered endpoint, and what is your next reconnaissance step?



LLM API Reconnaissance FAQ

How do you find undocumented AI endpoints?
Three approaches: passive traffic analysis (browse the application with Burp to capture all AI API calls including undocumented ones), JavaScript bundle analysis (search minified JS for API route strings and AI-related constants), and directory brute-forcing against common AI paths. Passive analysis is most reliable — undocumented endpoints get called when you use the application even if they’re not in any documentation.
How do you fingerprint which AI model powers an endpoint?
Five signals: response header analysis (X-Model-ID and similar custom headers), error message format (each provider has distinctive error schemas), response timing correlation with output length (linear = LLM), model-specific behaviour probes (distinctive outputs per model), and JavaScript constant extraction (frontend config often includes model names).
Why does AI API reconnaissance matter before injection testing?
Without reconnaissance, you might spend hours testing the visible chat endpoint while missing a more vulnerable undocumented endpoint without rate limiting. You might miss RAG endpoints where indirect injection is possible, or not know the model family and therefore not know which payload families have the highest success rate. Reconnaissance means testing what matters, not just what’s visible.
What are common signs that an endpoint uses an LLM backend?
Key indicators: streaming responses with chunked transfer encoding, variable response length and phrasing for identical inputs, responses containing LLM-typical hedging language, responses that include reasoning traces, and response timing that correlates linearly with output length rather than flat server processing time.

📚 Further Reading

  • Day 21 — LLM Authentication Bypass — Using the endpoint inventory from Day 20 to test authentication bypass patterns specific to AI APIs — gaps that emerge from inconsistent integration of auth layers.
  • Day 17 — Burp Suite for LLM Testing — The Burp proxy configuration and workflow that powers the passive discovery phase of Day 20’s reconnaissance methodology.
  • AI/LLM Hacking Course Hub — The complete 90-day course overview — Phase 1 (Days 1-15), Phase 2 (Days 16-20), and the advanced phases ahead.
  • LinkFinder — JavaScript Endpoint Extractor — The open-source tool for extracting API endpoint references from JavaScript bundles — essential for finding undocumented AI endpoints in frontend code.
ME
Mr Elite
Owner, SecurityElites.com
The document summariser with the production database credentials in the system prompt — the one nobody told me about because the client didn’t know it was AI-powered — was the engagement that turned reconnaissance from a step I did because it was in the methodology to a step I actually look forward to. You spend thirty minutes browsing an application and you either confirm what the brief told you, or you find fourteen things they didn’t mention. That thirty minutes consistently produces the most interesting findings on AI engagements. It’s not glamorous work. But it’s the work that determines whether your assessment is comprehensive or just thorough within the lines someone else drew.

Join free to earn XP for reading this article Track your progress, build streaks and compete on the leaderboard.
Join Free
Lokesh N. Singh aka Mr Elite
Lokesh N. Singh aka Mr Elite
Founder, Securityelites · AI Red Team Educator
Founder of Securityelites and creator of the SE-ARTCP credential. Working penetration tester focused on AI red team, prompt injection research, and LLM security education.
About Lokesh ->

Leave a Comment

Your email address will not be published. Required fields are marked *