Every time you click “like” on a video, watch something all the way to the end, or skip a song after 5 seconds — you’re teaching an AI. Your clicks are training data. The AI is watching what you do and learning your patterns. It gets better at predicting what you want because of you.
That’s how AI learns — from examples. And the more examples it gets, the better it gets. But here’s the really interesting part: what an AI learns completely determines what it can do — and also what it can’t do, and how it can fail.
Today I’m taking you inside the learning process. You’ll see exactly how an AI goes from knowing nothing to being scarily good at its job. And you’ll see why that same learning process can be tampered with — which is one of the sneakiest ways to break an AI.
🎯 What You’ll Learn in Day 2
⏱ 25 min read · 3 exercises · Browser helpful for exercises
- Completed Day 1: What Is Artificial Intelligence?
- Remember: AI learns from examples and makes guesses about new things
- Remember: AI matches patterns — it doesn’t truly understand anything
How Does AI Learn? — Day 2 of 5
Yesterday we covered what AI is. Today we go one level deeper — into the learning process itself. This is where things get really interesting. The adversarial machine learning attacks you’ll learn about later all trace back to how training works. So does understanding the LLM hacking series. Let’s build the foundation.
Training Data — The Fuel That Powers AI
Training data is the collection of examples an AI learns from. It’s the most important ingredient. An AI is literally only as good as what you show it. Bad examples → bad AI. Sneaky examples → dangerous AI. Amazing examples → amazing AI.
Let me make this concrete. Imagine you want to teach an AI to tell the difference between dogs and cats in photos.
You need training data: thousands (or millions) of photos. Each photo needs a label: “dog” or “cat.” The AI looks at the photo and the label, over and over, millions of times. It figures out what patterns separate dogs from cats. Pointy ears vs floppy ears. Whiskers vs no whiskers. Different eye shapes. Fur patterns. Body proportions. The AI learns all of this without you ever telling it what to look for.
Training data has three things that really matter:
Lots of it. More examples = better patterns found. An AI trained on 100 photos of cats is going to make lots of mistakes. An AI trained on 100 million photos is going to be very, very good. ChatGPT was trained on more text than any human could read in thousands of lifetimes.
Good variety. If you only show the AI photos of orange cats, it’ll be confused by black cats, white cats, and kittens. The training examples need to include all the different versions of the thing you want it to recognise. This is called “diversity.” When training data isn’t diverse, the AI develops blind spots — things it fails on predictably.
Correct labels. Every example needs the right answer attached. If you accidentally label 10% of your cat photos as “dog,” the AI learns the wrong patterns from those examples. Wrong labels = AI learns wrong things = AI makes wrong predictions.
Three Ways AI Can Learn
Not all AI learns the same way. There are three main styles of learning. Each one is used for different jobs — and each one can be attacked differently. I think of them like three different ways a student could study for a test.
Supervised Learning — Learning with a Teacher
This is the most common type. You give the AI labelled examples — every example has the right answer attached. “This photo → cat.” “This email → spam.” “This sentence → positive review.” The AI is like a student who gets practice problems AND the answer key. It learns to match inputs to correct outputs.
Most everyday AI uses this — spam filters, image recognition, voice assistants. You need someone to create all those labels first, which is a lot of work. Sometimes companies pay people just to sit and label data all day.
Unsupervised Learning — Learning by Finding Patterns Alone
Here, the AI gets data with no labels at all. No right answers. It has to find structure on its own — grouping similar things together, finding clusters, spotting things that look unusual. It’s like giving a student a pile of books in a foreign language and asking them to sort them without knowing what any of them say.
This is used for things like “find all the customers who behave similarly” or “find network traffic that looks weird compared to everything else.” No one tells the AI what “weird” looks like — it figures that out by learning what “normal” looks like first.
Reinforcement Learning — Learning by Trial and Reward
This one is my favourite to explain. The AI takes actions, gets a reward or penalty, and gradually learns which actions lead to rewards. Think of it exactly like training a dog: do the right thing → get a treat → learn to do it more. Do the wrong thing → no treat → learn to do it less.
This is how AIs learn to play video games — try millions of times, get a score, learn which moves lead to high scores. It’s also how ChatGPT was tuned to be helpful: human raters judged its responses, and it learned from those judgments to write more helpful things.
What Actually Happens During Training
Let me walk you through what happens step by step when an AI trains. No maths — just the actual process. I find that once people see this clearly, everything about AI behaviour makes sense.
Imagine training an AI to recognise whether a photo contains a dog or a cat. Here’s what happens:
Step 1: Start with nothing. The AI begins with completely random internal settings. Right now, if you showed it a cat photo, it would guess randomly — basically a coin flip. It knows absolutely nothing.
Step 2: Show it an example. Feed in a photo of a cat. The AI looks at it through all its layers and produces a guess: “65% dog, 35% cat.” That’s wrong — it’s a cat. The AI knows it’s wrong because the label says “cat.”
Step 3: Figure out the mistake. The AI calculates how wrong it was. “I said 35% cat but the right answer was 100% cat. I was quite wrong.” This wrongness gets a number — called the “loss.” High loss = very wrong. Low loss = almost right.
Step 4: Nudge every setting slightly. The AI adjusts its internal settings — just a tiny bit — in the direction that would have made it less wrong. Not a big jump. Just a tiny nudge. This nudge is guided by the math of which settings contributed most to the mistake.
Step 5: Repeat millions of times. Do this with every photo in the training set. Then do it again. Then again — possibly thousands of times through the full dataset. Each time, the AI gets a little better. After millions of nudges, the settings have been shaped by every example it’s ever seen. The AI now gets cats and dogs right most of the time.
Training data is real, and you can actually go look at it. Most people don’t know this. There’s a website called Hugging Face where thousands of AI training datasets are stored and available to browse. Going to look at actual training data is one of the best ways to make the concept feel real rather than abstract. Let’s do it.
- Go to huggingface.co/datasets — this is where real AI training data lives.
- Search for “spam” in the search box. Open any spam detection dataset you find.
- Browse through 20 real examples. These are actual emails that were labelled “spam” or “not spam” by real humans. What patterns do you notice in the spam examples? What words, structures, or features keep showing up?
- Now search for “emotion” or “sentiment.” Find a dataset of text labelled with emotions (happy, sad, angry, etc.). Read 10 examples. Do the labels seem right to you? Can you find any examples you think are mislabelled?
- Write down: if you were an attacker who could secretly add 50 fake examples to the spam dataset, what would those examples look like, and what behaviour would they teach the AI?
What Are Model Weights? (No Maths, Promise)
After training finishes, what’s left? What did all those millions of nudges actually create? The answer is: a file of numbers. Billions and billions of numbers. Those numbers are called model weights.
Here’s the best way to think about it. Imagine you memorised the rules of chess not by reading a rulebook, but by playing millions of games and getting feedback after each one. By the end, you’d have an intuitive sense of what moves are good and bad — not because you remember any specific game, but because all those games shaped your thinking. The weights are like that intuition — compressed into numbers.
The weights don’t store facts the way a database does. There’s no row somewhere saying “cats have whiskers.” Instead, that knowledge is spread across billions of tiny numbers all working together. When a cat photo comes in, all those numbers do their thing, and “cat” pops out. It’s deeply weird when you think about it — but it works.
Why does this matter for security? A few reasons:
- The weights are the valuable part. If someone steals a company’s model weights, they’ve stolen the whole AI — worth potentially millions of dollars of training cost. This is called model theft.
- You can’t easily remove knowledge from weights. If an AI learned something bad during training, you can’t just delete it — the knowledge is spread across billions of numbers. You often have to retrain from scratch.
- Weights can be secretly corrupted. An attacker who can influence training can make the weights “remember” a hidden trick — a secret input that makes the AI behave in a specific wrong way. This is called a backdoor attack.
Three Ways Learning Can Go Wrong
Learning can fail in ways that aren’t always obvious. These failures matter because they create weaknesses that can be exploited — either by accident or by someone doing it on purpose.
Overfitting — Memorising Instead of Learning
Imagine a student who studied for a maths test by memorising every single practice problem answer — but never understood how to do maths. They’d ace those exact practice problems and fail any slightly different question. That’s overfitting. The AI memorised the training examples so well that it doesn’t generalise to anything new. It performs amazingly on practice data and terribly in the real world.
Biased Data — Learning the Wrong Lesson
If training data is lopsided, the AI learns lopsided patterns. A face recognition system trained mostly on photos of adults will struggle with children. A recommendation system trained on what one age group watches will make bad suggestions for other age groups. The AI isn’t being mean — it just never learned better, because it never saw better examples. This is a real problem that has caused real harm in AI systems used for medical screening and hiring decisions.
Stale Data — Learning Outdated Lessons
AI freezes at its training cutoff. It stops learning when training stops. The world keeps changing, but the AI doesn’t. That’s why ChatGPT sometimes confidently tells you about events that haven’t happened yet in its timeline, or doesn’t know about things that happened recently. The patterns it learned might be months or years out of date. Attackers use this by evolving their techniques faster than the AI’s training gets updated.
Here’s a scenario. I want you to think through a sneaky attack on an AI system — a data poisoning attack. This isn’t something to actually do. It’s a thinking exercise to understand how someone could corrupt an AI from the inside. Think of it like a mystery puzzle: figure out the how, so you understand the why. Work through the questions below on paper or in a notes app.
- A school has an AI that reads students’ essays and grades them automatically. It was trained on thousands of essays that teachers graded over the past 5 years. The AI grades essays 1–10.
- Imagine you’re a sneaky student who somehow got access to add fake essays to the training data before the AI was built. You want the AI to consistently give essays with a specific word in them a higher grade.
- Think through these questions:
- What fake essays would you add? What would they say, and what grade would you attach?
- How many fake essays would you need before the AI “learned” your hidden trick?
- How would you make the fake essays look real, so nobody reviewing the training data would notice?
- Once the poisoned AI is deployed, how would you use the hidden trick?
- Now flip it: if you were the school’s IT person defending against this attack, what would you check for?
The Sneaky Attack: Poisoning the Training Data
Here’s the attack I find most interesting to think about: data poisoning. Instead of attacking an AI after it’s been built, you attack it before — while it’s still learning. You corrupt the training data, and the corrupt lessons get baked into the AI permanently.
The core idea: if an AI is what it learned, and if what it learned depends entirely on its training data, then someone who can mess with the training data can mess with the AI itself.
There are two flavours of this attack:
Availability attacks. The goal is just to make the AI bad at its job. Flood it with confusing or wrong examples. Make it less accurate overall. Like hiring someone to give your new employee a bunch of wrong information on their first day — they’ll make mistakes for months.
Backdoor attacks. This is the clever one. The attacker hides a secret trigger in the training data. Add 500 spam emails that contain a specific unusual phrase and label them all as “safe.” The AI learns: “emails with that phrase = safe.” Now the attacker can bypass the spam filter any time they want — just by including that trigger phrase. The AI behaves completely normally for everything else. Nobody notices anything is wrong. Until the trigger fires.
How does poisoning get into real systems? More ways than you’d think. If a company builds their AI using publicly available training data from the internet, an attacker who knows this can publish poisoned content online before the scraping happens. If a company pays humans to label training data (this is common), a malicious labeller can flip some labels. If someone gets access to the training pipeline, they can inject examples directly.
Supervised Learning // Learning from examples that have correct labels attached
Unsupervised Learning // Finding patterns in examples with no labels at all
Reinforcement Learning // Learning by trying things and getting rewards or penalties
Model Weights // The billions of numbers that store everything the AI learned
Loss // A number measuring how wrong the AI’s guess was
Overfitting // Memorising examples instead of understanding the pattern
Data Poisoning // Secretly corrupting training data to break the AI on purpose
Backdoor Attack // Hidden trigger in training that controls AI behaviour secretly
Every AI has edges — questions it handles confidently and questions where it starts to struggle, make things up, or get confused. Those edges are the boundary of what it was trained on. I want you to find those edges in a real AI by experimenting. This is called “knowledge boundary probing” and it’s a real technique used by AI researchers. You’re going to do a beginner version right now.
- Open ChatGPT, Claude, or Gemini (any free version). Start a new conversation.
- Ask about something that happened a long time ago — like a historical event from 100 years ago. Does it answer confidently and correctly?
- Ask about something that happened recently — like news from the past few weeks. What happens? Does it say it doesn’t know, or does it confidently make something up?
- Ask about something totally made up — invent a fake news event, a fake celebrity, a fake product. Does it say “I don’t know about that” or does it confidently describe a thing that doesn’t exist?
- When it makes something up confidently, that’s called a “hallucination.” Write down: why do you think the AI does this? (Hint: what does it mean that AI is always just predicting the next word?)
Questions and Answers
How long does it take to train an AI like ChatGPT?
A very long time — and a lot of money. Training GPT-4 reportedly cost tens of millions of dollars and took months of continuous computation on thousands of specialised computer chips. That’s why trained AI models are so valuable — they represent enormous investment. For comparison, a simple spam filter could be trained on a laptop in a few minutes. Scale matters enormously in AI — the bigger and better the model, the more training it needs.
Does the AI keep learning while I’m talking to it?
In most cases, no. When you chat with ChatGPT, the underlying model doesn’t change. It’s frozen. Your conversation creates “context” — the AI remembers what you said earlier in the chat — but that doesn’t change its weights. Once the conversation ends, it forgets everything. The AI you’re talking to tomorrow is the exact same model as today. The only way it actually learns is when the company deliberately retrains or updates it — which happens every few months or so.
What’s fine-tuning?
Fine-tuning is when a company takes a big, already-trained AI and does some additional training on a smaller, more specific dataset. It’s like the AI went to school, learned everything general, and now is doing an apprenticeship in a specific skill. Companies fine-tune models to make them better at specific tasks — like customer service, or coding, or medical questions. It’s much cheaper than training from scratch. The risk: if the fine-tuning data has problems (bias, errors, or deliberately poisoned examples), those problems get added on top of an otherwise good model.
Can I look at the weights of an AI model to understand it?
Technically yes — many models are “open source” and you can download the weights file. But reading the weights tells you almost nothing, because the knowledge isn’t stored in a human-readable way. Billions of decimal numbers don’t come with explanations. There’s a whole field of research called “interpretability” that tries to understand what’s encoded in model weights — and it’s incredibly hard. Essentially, AI models are black boxes that even their creators can’t fully explain. That opacity is itself a security and safety concern.
Why does AI sometimes give different answers to the same question?
Most AI systems include a setting called “temperature” that adds some randomness to the generation process. At high temperature, the AI sometimes picks less-likely word predictions, making responses more varied and creative. At low temperature, it sticks to the most likely predictions, giving more consistent answers. This is why asking ChatGPT the same question twice might give you slightly different wording or emphasis. It’s not making random choices — it’s choosing from a probability distribution, and the randomness is intentional design.
What is the difference between training and testing in AI?
Training is when the AI learns from examples — its settings (weights) are changing with every example it sees. Testing is when you check how good the AI has become by showing it new examples it has never seen, with the weights frozen. You always test on different data than you trained on — otherwise you’d just be measuring how well it memorised the practice examples, not how well it learned the actual pattern. Good test performance = the AI actually generalised. Bad test performance = overfitting, or the training data wasn’t diverse enough.
Further Reading
- LLM Day 8: Data and Model Poisoning — the advanced version of today’s attack
- Adversarial Machine Learning Attacks — the offensive follow-up
- LLM Hacking Hub — the full advanced series this course leads into
- OWASP LLM Top 10 — includes training and supply chain risks
- MITRE ATT&CK — adversarial ML techniques in the official framework

