Chapter 4

The Playground

Build any network you want. Choose a dataset, tweak the architecture, and watch it learn in real time.

🎮 Welcome to Your Neural Network Sandbox

This is where everything from the previous chapters comes together. The playground below is a sandbox — a place where you can experiment freely, break things, and learn by doing. There are no wrong answers here. Try wild combinations! Make the network fail spectacularly! That's how you build intuition.

🎯 Your goal: By the end of this chapter, you should be able to pick a dataset, design a network architecture, choose sensible settings, and successfully train the network to classify the data. If you can do that for the Spiral dataset, you're officially doing machine learning!

Real machine learning engineers spend a huge amount of time doing exactly this — tweaking settings, running experiments, watching what happens, and building an instinct for what works. This playground gives you that same experience, compressed into a few minutes instead of a few years. Every great ML practitioner started by playing.

Before you start clicking, let's take a tour of what you're looking at. The playground has three panels, each showing you something different about the network:

📋 Left Panel — Configuration

This is your control center. You'll choose what kind of data the network should learn, how big the network should be, and how it should learn. Think of it as the "settings" for your experiment. Every option is explained in detail below — don't worry if the names sound intimidating, they're simpler than they sound.

The left panel has four cards: Dataset (pick your puzzle), Architecture (design the network shape), Settings (fine-tune learning behavior), and Controls (play/pause/reset). You'll also see live stats at the bottom showing the current epoch (training step), loss (how wrong the network is), and parameter count (total number of knobs the network is adjusting).

🎨 Center Panel — The Decision Boundary

This is the main event! The large colorful canvas shows you what the network "sees." Here's how to read it:

Purple/dark regions = the network predicts "class 1" here
Light/pale regions = the network predicts "class 0" here
Colored dots = the actual training data. Each dot belongs to one of two classes.
A well-trained network will have purple regions surrounding class-1 dots and light regions surrounding class-0 dots.

As the network trains, watch the colors shift and flow like liquid — that's the network literally changing its mind about how to classify different regions of space. Early on, the boundary is usually a simple line or blob. As training progresses, it gets more detailed, curving around individual data points. This is the network learning!

If you see the boundary getting too complex — curving wildly between individual dots — that's a sign of overfitting (the network memorizing instead of learning). We'll talk about how to fix that with regularization below.

📊 Right Panel — Network Structure & Loss

The top card shows a diagram of your network — circles are neurons, lines are connections (weights). The thicker or more colorful a connection, the stronger that weight is. When you change the architecture (add layers, adjust neurons), this diagram updates instantly so you can see what you're building.

Below the network diagram is the loss curve — this is your progress tracker. The x-axis is time (training epochs) and the y-axis is how wrong the network is. You want this line to go down and stay down. When it flattens near zero, your network has learned the pattern. If it bounces around instead of going down, your learning rate is probably too high. If it barely moves, your learning rate is too low or your network is too small for the problem.

Below both of those is a narration box that gives you a plain-English description of what's happening during training. It'll tell you when the network is just guessing, when it's making progress, and when it's nailed the dataset. Think of it as your personal ML coach, commenting on your network's performance in real time.

Together, these three panels give you a complete picture: what you configured (left), what the network learned (center), and how it got there (right). This is essentially the same monitoring setup that real ML engineers use, just simplified and visual.

⚙️ Understanding Every Setting

Let's go through each option so you know exactly what you're tweaking. Understanding these settings is the difference between "randomly clicking buttons" and "actually doing machine learning." Each one corresponds to a real decision that ML engineers make every day.

Don't try to memorize all of this before playing — read through it once, then come back and re-read specific sections as you experiment. Learning by doing is the whole point of this chapter.

📐 Dataset

The dataset is the shape of data the network needs to learn to separate. Think of it as the "puzzle" you're giving the network. Each dataset places dots on the canvas in a different pattern — some dots are one color (class 0) and others are a different color (class 1). The network's job is to figure out the rule that separates the two groups and draw a boundary between them.

The datasets are ordered roughly by difficulty. Start with Circle to build confidence, then work your way up to the harder ones:

Circle — One class forms a ring around the other. Like a bullseye target. A simple network can handle this.
XOR — The diagonal pattern from Chapter 3. Dots in opposite corners belong to the same class. Needs at least one hidden layer.
Spiral — Two classes wound around each other like a cinnamon roll. This is hard — the boundary needs to twist and turn, requiring multiple layers.
Gaussian — Two overlapping blobs of dots. Like two clouds that partially overlap. Some points near the boundary are genuinely ambiguous — even the "perfect" classifier would get some wrong. This teaches an important lesson: sometimes there's no perfect answer.
Checkerboard — A grid pattern like a chess board. Each square alternates between classes. Very hard — the network needs to learn multiple separate regions simultaneously. This is a great stress test for your network architecture.

Difficulty ranking: Circle (⭐) → Gaussian (⭐⭐) → XOR (⭐⭐⭐) → Spiral (⭐⭐⭐⭐) → Checkerboard (⭐⭐⭐⭐⭐). If you can get a network to nail the Checkerboard with low loss, you're doing great!

Each time you select a new dataset (or click Regenerate), 300 random data points are placed on the canvas. The randomness means every run is slightly different — which is actually realistic! In real ML, your training data is always a finite, imperfect sample of the true pattern.

🏗️ Architecture (Hidden Layers)

This is arguably the most important setting — it controls the shape of your brain. Specifically, it determines how many hidden layers your network has and how many neurons are in each one.

Use the "+ Layer" and "- Layer" buttons to add or remove layers (up to 6 layers max), and the sliders to set how many neurons each layer has (from 1 to 12). Each layer is labeled L1, L2, L3, etc.

Why does this matter? A single neuron can only draw a straight line. A single layer of neurons can draw multiple straight lines and combine them into simple shapes. But for complex patterns like spirals or checkerboards, you need multiple layers that build on each other — each layer detecting more abstract features than the last.

Think of it like this: each layer is a level of thinking. One layer can detect simple features ("is the dot on the left or right?"). Two layers can combine those features ("is the dot in the top-left quadrant?"). Three layers can detect even more abstract patterns ("is the dot on this spiral arm?").

More layers = can learn more complex patterns, but takes longer to train and needs more data. More neurons per layer = more nuance at each thinking level. A common beginner mistake is making the network way too big for a simple problem — start small and add complexity only when needed.

Rule of thumb: Start with 1–2 layers of 4 neurons each. If the network can't learn the pattern after training, add more. If it learns too fast and the boundary looks wiggly and overfit, you might have too many neurons — try reducing or adding regularization.

📏 Learning Rate (LR)

If architecture is the most important structural choice, learning rate is the most important training choice. It controls how big each weight adjustment step is when the network learns.

Imagine you're walking toward a target in a dark room, and after each step someone tells you which direction to go:

High learning rate (like 0.5 or 1.0) = taking big steps. You'll get close fast, but you might overshoot and keep jumping back and forth past the target.
Low learning rate (like 0.001) = taking tiny baby steps. You'll get there eventually, but it takes forever.
Just right (usually 0.01–0.1) = steady progress toward the goal.

Recommended starting point: Try 0.05. This works well for most datasets in this playground. Once you're comfortable, experiment with extremes to see what happens — that's the whole point!

Try cranking the learning rate to 1.0 and watch the loss curve bounce around wildly. Then try 0.001 and watch it barely move. Finding the sweet spot is a real skill in machine learning — in industry, people often try dozens of learning rates to find the best one for their specific problem.

🔧 Activation Function

The activation function is the "squishing function" each neuron uses to decide its output. Without an activation function, the network would just be doing simple linear math — it could only draw straight-line boundaries no matter how many layers you add. The activation function is what gives neural networks their power to learn curves, spirals, and complex shapes.

You met these in earlier chapters, but here's a refresher:

ReLU (Rectified Linear Unit) — The most popular activation in modern deep learning. Dead simple: if the input is positive, pass it through unchanged. If negative, output zero. This creates sharp, angular decision boundaries. It's fast to compute and works great in practice. Start here unless you have a reason not to.
Sigmoid — Squishes everything into a smooth curve between 0 and 1. Creates gentle, rounded boundaries. It was the go-to activation in the 1990s but has fallen out of favor for hidden layers because it can cause "vanishing gradients" — the learning signal gets weaker in deeper networks, making training slow.
Tanh — Squishes everything between -1 and +1. Like sigmoid but centered at zero, which often helps the network learn faster. A solid middle-ground choice. Try comparing Tanh vs Sigmoid on the same dataset — you'll often see Tanh converge faster.

Quick comparison tip: Try the Checkerboard dataset with each activation and the same architecture. ReLU tends to create blocky, grid-like boundaries (which actually suits checkerboard well!). Sigmoid creates smoother, rounder regions.

🛡️ L2 Regularization

L2 regularization is a technique that prevents the network from memorizing the training data too exactly. It's like telling a student: "Learn the concepts, not the specific answers to the practice test."

Without regularization, a powerful network might create a super-complex, wiggly boundary that perfectly fits every single training dot — including the noisy or misleading ones. This is called overfitting, and it's one of the biggest problems in machine learning. A network that overfits has essentially memorized the answers instead of learning the pattern — like a student who memorizes the answer key but can't solve a new problem.

L2 regularization adds a small penalty for having large weights, which encourages the network to keep things simpler. Mathematically, it adds the sum of all squared weights (multiplied by a small number) to the loss. The network is now trying to minimize two things at once: the prediction error AND the size of its weights. This forces it to find the simplest boundary that still works.

L2 = 0 — No regularization. The network can go wild.
L2 = small value (0.001–0.005) — Gentle nudge toward simplicity. Usually a good idea.
L2 = large value (0.01+) — Strong simplicity constraint. The boundary will be very smooth, but might be too simple to fit the data well (called underfitting).

The sweet spot depends on your network size and dataset complexity. A big network on a simple dataset needs more regularization. A small network on a hard dataset might not need any — it's already constrained by its size. Experiment and watch the boundary!

💡 Quick Reference Card

Network too simple? → Add layers or neurons
Loss bouncing wildly? → Lower the learning rate
Loss barely moving? → Raise the learning rate, or the network is too small
Boundary too wiggly? → Increase L2 regularization or reduce network size
Boundary too smooth? → Decrease L2 regularization or add more neurons
Training seems stuck? → Reset and try again (random initialization matters!)

🧪 The Playground

Alright, you know what everything does — now go play! Hit ▶ Train to start training, ⏸ Pause to freeze training so you can examine the boundary, and Reset Network to start fresh with new random weights (the data stays the same). Use Regenerate to create a new random set of data points for the current dataset type.

Pro tip: Pause the training periodically to study the decision boundary. How does it compare to what you'd draw by hand? Are there regions where the network is clearly wrong? What would help — more neurons, more layers, or different settings?

Remember: Every time you hit "Reset Network," the weights are randomized again. This means two identical networks can learn differently! If training seems stuck or weird, just reset and try again — sometimes you just got unlucky with the initial random weights.

Dataset

Architecture

Settings

LR 0.05

Activation:

L2 0

Epoch

—

Loss

—

Parameters

🧪 Try This!

Not sure where to start? Here are some guided experiments that will teach you a ton. Work through them in order — each one builds on the intuition from the previous one. Don't just read them — actually do each step in the playground above!

⏱️ Time estimate: Each experiment takes 2–5 minutes. The whole set takes about 20 minutes.

Experiment 1: The Power of Layers

Select the Spiral dataset.
Remove layers until you have just 1 hidden layer with 4 neurons.
Hit Train. Watch it struggle — the boundary can't twist enough to follow the spiral.
Now reset, add a second layer (+ Layer button), and train again. Notice the improvement.
Add a third layer. Now the network can trace the spiral beautifully!

Lesson: More layers let the network draw increasingly complex boundaries. But each layer adds training time and parameters.

Experiment 2: Learning Rate Gone Wrong

Use the Circle dataset with 2 layers of 4 neurons each.
Set the learning rate to 1.0 (drag the LR slider all the way right).
Hit Train. Watch the loss curve — it bounces wildly instead of going down smoothly!
Reset. Now set LR to 0.001. Train. The loss drops... but painfully slowly.
Reset. Try 0.05. Smooth, steady learning. Just right!

Lesson: Learning rate is one of the most important settings in machine learning. Too high and the network can't settle down — it keeps overshooting the optimal weights. Too low and you'll be waiting all day. In practice, ML engineers often start with a moderate learning rate and decrease it over time (called a "learning rate schedule"), but for this playground, a fixed rate works fine.

Fun fact: In real-world deep learning, there are entire research papers dedicated just to figuring out better ways to set the learning rate. It's that important!

Experiment 3: Overfitting vs. Regularization

Select the Gaussian dataset (two overlapping blobs).
Use 3 layers with lots of neurons (like 8, 8, 8). Keep L2 at 0.
Train for a while. Notice how the boundary gets really wiggly, trying to perfectly classify every single dot — even the ones deep in the "wrong" territory.
Now reset, set L2 to 0.005, and train again. The boundary is smoother and more sensible — it accepts that some dots near the overlap are just hard to classify.

Lesson: A network that's too powerful will memorize noise. Regularization keeps it honest. This is one of the most important concepts in all of machine learning — the balance between a model that's too simple (underfitting) and one that's too complex (overfitting).

Experiment 4: Activation Function Showdown

Select the Checkerboard dataset with 3 layers (6, 6, 4).
Set activation to ReLU. Train for ~2000 epochs. Pause and look at the boundary shape — notice the sharp, angular edges.
Reset. Switch to Sigmoid, same architecture. Train again. The boundary is rounder, smoother — and it might learn slower.
Reset. Try Tanh. Often the fastest to converge on this dataset.

Lesson: The activation function shapes the "vocabulary" of boundaries the network can draw. Different activations suit different problems.

Experiment 5: The Impossible Checkerboard

Select the Checkerboard dataset.
Start with 1 layer, 4 neurons. Train. It barely makes a dent!
Try 3 layers (8, 6, 4) with ReLU. Train for a while. It should start to carve out the checkerboard pattern, though it's tough.
This is one of the hardest patterns — it shows why some problems need really deep networks.
If you're feeling ambitious, try to get the loss below 0.05. It's possible but takes patience and the right architecture!

Lesson: Some patterns are genuinely hard, and no amount of clever settings can replace having enough network capacity. When your network can't learn, sometimes the answer is simply "make it bigger."

Experiment 6: The Minimalist Challenge

Select the Circle dataset — the easiest one.
Use just 1 hidden layer with 2 neurons. Can it learn the circle? (Probably not well!)
Try 1 layer with 3 neurons. Better?
What's the smallest network that can perfectly classify the circle? Try to find it!

Lesson: In real ML, smaller models are better — they're faster, use less memory, and generalize better. Finding the smallest network that solves your problem is an art. This principle is called Occam's Razor in machine learning — prefer the simplest model that explains the data.

🎯 Create Your Own Challenge

Once you've done the guided experiments, try setting your own goals:

Can you solve Spiral with loss below 0.01? What's the smallest network that achieves it?
Can you find a learning rate that works for ALL datasets without changing it?
What happens if you set every layer to just 1 neuron? Can any dataset be learned?
Try training, pausing, changing the learning rate, then resuming. Does a "learning rate schedule" (starting high, going low) help?
What's the maximum number of parameters you can create? (Hint: 6 layers of 12 neurons each!) Does more always mean better?

📝 What You Just Learned

If you've worked through even a few of those experiments, you've already developed real intuition about neural networks. Let's crystallize the key takeaways:

A neural network playground lets you experiment with different datasets, architectures, and settings to build intuition about how networks learn.
The decision boundary (the colored canvas) shows you what the network has learned — where it draws the line between the two classes.
The loss curve tells you how the network is progressing. Down = good. Flat near zero = done learning. Bouncing = learning rate might be too high.
Harder datasets (spiral, checkerboard) need deeper networks with more layers and neurons.
The learning rate is a critical knob — too high and the network overshoots, too low and it crawls.
L2 regularization prevents overfitting by encouraging simpler boundaries.
There's no single "best" network for everything — the right architecture depends on the problem. Experimentation is key!
Different activation functions create different boundary shapes — ReLU makes angular boundaries, Sigmoid/Tanh make smoother ones.
The parameter count tells you how complex your model is. More parameters = more expressive, but also more prone to overfitting and slower to train.
Finding the right combination of settings is more art than science — that's why playgrounds like this are so valuable for building intuition.
Random initialization matters — the same network trained twice can learn differently because it starts from different random weights. If training seems stuck, try resetting!

🗺️ The Bigger Picture

Everything you've experimented with here — choosing architectures, tuning learning rates, fighting overfitting — is exactly what machine learning engineers do at companies like Google, OpenAI, and Meta. The only differences are scale (they use millions of neurons instead of dozens) and data (they train on massive real-world datasets instead of 300 colored dots). But the principles? Identical.

The fact that you can now look at a loss curve and say "learning rate is too high" or see a wiggly boundary and think "needs regularization" means you've developed real ML intuition. That's not something you can learn from reading a textbook — it comes from doing, from playing, from failing and trying again. Which is exactly what you just did.

These aren't just toy concepts — every single one of these settings exists in the real neural networks that power ChatGPT, image recognition, self-driving cars, and more. The networks are bigger (millions or billions of parameters instead of dozens), but the fundamental ideas are identical. You're learning the real thing.

You now have hands-on experience with the core concepts of neural networks: neurons, layers, activation functions, learning rates, and regularization. These are the building blocks of every neural network ever built — from the tiny playground networks you just trained to the massive language models that power AI assistants.

In the next chapter, we'll jump from numbers to words — and see how neural networks can understand language. How do you turn a sentence into numbers that a network can process? How does a neural network learn the meaning of words? 🚀