Chapter 4

The Playground

Build any network you want. Choose a dataset, tweak the architecture, and watch it learn in real time.

๐ŸŽฎ Welcome to Your Neural Network Sandbox

This is where everything from the previous chapters comes together. The playground below is a sandbox โ€” a place where you can experiment freely, break things, and learn by doing. There are no wrong answers here. Try wild combinations! Make the network fail spectacularly! That's how you build intuition.

๐ŸŽฏ Your goal: By the end of this chapter, you should be able to pick a dataset, design a network architecture, choose sensible settings, and successfully train the network to classify the data. If you can do that for the Spiral dataset, you're officially doing machine learning!

Real machine learning engineers spend a huge amount of time doing exactly this โ€” tweaking settings, running experiments, watching what happens, and building an instinct for what works. This playground gives you that same experience, compressed into a few minutes instead of a few years. Every great ML practitioner started by playing.

Before you start clicking, let's take a tour of what you're looking at. The playground has three panels, each showing you something different about the network:

๐Ÿ“‹ Left Panel โ€” Configuration

This is your control center. You'll choose what kind of data the network should learn, how big the network should be, and how it should learn. Think of it as the "settings" for your experiment. Every option is explained in detail below โ€” don't worry if the names sound intimidating, they're simpler than they sound.

The left panel has four cards: Dataset (pick your puzzle), Architecture (design the network shape), Settings (fine-tune learning behavior), and Controls (play/pause/reset). You'll also see live stats at the bottom showing the current epoch (training step), loss (how wrong the network is), and parameter count (total number of knobs the network is adjusting).

๐ŸŽจ Center Panel โ€” The Decision Boundary

This is the main event! The large colorful canvas shows you what the network "sees." Here's how to read it:

As the network trains, watch the colors shift and flow like liquid โ€” that's the network literally changing its mind about how to classify different regions of space. Early on, the boundary is usually a simple line or blob. As training progresses, it gets more detailed, curving around individual data points. This is the network learning!

If you see the boundary getting too complex โ€” curving wildly between individual dots โ€” that's a sign of overfitting (the network memorizing instead of learning). We'll talk about how to fix that with regularization below.

๐Ÿ“Š Right Panel โ€” Network Structure & Loss

The top card shows a diagram of your network โ€” circles are neurons, lines are connections (weights). The thicker or more colorful a connection, the stronger that weight is. When you change the architecture (add layers, adjust neurons), this diagram updates instantly so you can see what you're building.

Below the network diagram is the loss curve โ€” this is your progress tracker. The x-axis is time (training epochs) and the y-axis is how wrong the network is. You want this line to go down and stay down. When it flattens near zero, your network has learned the pattern. If it bounces around instead of going down, your learning rate is probably too high. If it barely moves, your learning rate is too low or your network is too small for the problem.

Below both of those is a narration box that gives you a plain-English description of what's happening during training. It'll tell you when the network is just guessing, when it's making progress, and when it's nailed the dataset. Think of it as your personal ML coach, commenting on your network's performance in real time.

Together, these three panels give you a complete picture: what you configured (left), what the network learned (center), and how it got there (right). This is essentially the same monitoring setup that real ML engineers use, just simplified and visual.

โš™๏ธ Understanding Every Setting

Let's go through each option so you know exactly what you're tweaking. Understanding these settings is the difference between "randomly clicking buttons" and "actually doing machine learning." Each one corresponds to a real decision that ML engineers make every day.

Don't try to memorize all of this before playing โ€” read through it once, then come back and re-read specific sections as you experiment. Learning by doing is the whole point of this chapter.

๐Ÿ“ Dataset

The dataset is the shape of data the network needs to learn to separate. Think of it as the "puzzle" you're giving the network. Each dataset places dots on the canvas in a different pattern โ€” some dots are one color (class 0) and others are a different color (class 1). The network's job is to figure out the rule that separates the two groups and draw a boundary between them.

The datasets are ordered roughly by difficulty. Start with Circle to build confidence, then work your way up to the harder ones:

Difficulty ranking: Circle (โญ) โ†’ Gaussian (โญโญ) โ†’ XOR (โญโญโญ) โ†’ Spiral (โญโญโญโญ) โ†’ Checkerboard (โญโญโญโญโญ). If you can get a network to nail the Checkerboard with low loss, you're doing great!

Each time you select a new dataset (or click Regenerate), 300 random data points are placed on the canvas. The randomness means every run is slightly different โ€” which is actually realistic! In real ML, your training data is always a finite, imperfect sample of the true pattern.

๐Ÿ—๏ธ Architecture (Hidden Layers)

This is arguably the most important setting โ€” it controls the shape of your brain. Specifically, it determines how many hidden layers your network has and how many neurons are in each one.

Use the "+ Layer" and "- Layer" buttons to add or remove layers (up to 6 layers max), and the sliders to set how many neurons each layer has (from 1 to 12). Each layer is labeled L1, L2, L3, etc.

Why does this matter? A single neuron can only draw a straight line. A single layer of neurons can draw multiple straight lines and combine them into simple shapes. But for complex patterns like spirals or checkerboards, you need multiple layers that build on each other โ€” each layer detecting more abstract features than the last.

Think of it like this: each layer is a level of thinking. One layer can detect simple features ("is the dot on the left or right?"). Two layers can combine those features ("is the dot in the top-left quadrant?"). Three layers can detect even more abstract patterns ("is the dot on this spiral arm?").

More layers = can learn more complex patterns, but takes longer to train and needs more data. More neurons per layer = more nuance at each thinking level. A common beginner mistake is making the network way too big for a simple problem โ€” start small and add complexity only when needed.

Rule of thumb: Start with 1โ€“2 layers of 4 neurons each. If the network can't learn the pattern after training, add more. If it learns too fast and the boundary looks wiggly and overfit, you might have too many neurons โ€” try reducing or adding regularization.

๐Ÿ“ Learning Rate (LR)

If architecture is the most important structural choice, learning rate is the most important training choice. It controls how big each weight adjustment step is when the network learns.

Imagine you're walking toward a target in a dark room, and after each step someone tells you which direction to go:

Recommended starting point: Try 0.05. This works well for most datasets in this playground. Once you're comfortable, experiment with extremes to see what happens โ€” that's the whole point!

Try cranking the learning rate to 1.0 and watch the loss curve bounce around wildly. Then try 0.001 and watch it barely move. Finding the sweet spot is a real skill in machine learning โ€” in industry, people often try dozens of learning rates to find the best one for their specific problem.

๐Ÿ”ง Activation Function

The activation function is the "squishing function" each neuron uses to decide its output. Without an activation function, the network would just be doing simple linear math โ€” it could only draw straight-line boundaries no matter how many layers you add. The activation function is what gives neural networks their power to learn curves, spirals, and complex shapes.

You met these in earlier chapters, but here's a refresher:

Quick comparison tip: Try the Checkerboard dataset with each activation and the same architecture. ReLU tends to create blocky, grid-like boundaries (which actually suits checkerboard well!). Sigmoid creates smoother, rounder regions.

๐Ÿ›ก๏ธ L2 Regularization

L2 regularization is a technique that prevents the network from memorizing the training data too exactly. It's like telling a student: "Learn the concepts, not the specific answers to the practice test."

Without regularization, a powerful network might create a super-complex, wiggly boundary that perfectly fits every single training dot โ€” including the noisy or misleading ones. This is called overfitting, and it's one of the biggest problems in machine learning. A network that overfits has essentially memorized the answers instead of learning the pattern โ€” like a student who memorizes the answer key but can't solve a new problem.

L2 regularization adds a small penalty for having large weights, which encourages the network to keep things simpler. Mathematically, it adds the sum of all squared weights (multiplied by a small number) to the loss. The network is now trying to minimize two things at once: the prediction error AND the size of its weights. This forces it to find the simplest boundary that still works.

The sweet spot depends on your network size and dataset complexity. A big network on a simple dataset needs more regularization. A small network on a hard dataset might not need any โ€” it's already constrained by its size. Experiment and watch the boundary!

๐Ÿ’ก Quick Reference Card
  • Network too simple? โ†’ Add layers or neurons
  • Loss bouncing wildly? โ†’ Lower the learning rate
  • Loss barely moving? โ†’ Raise the learning rate, or the network is too small
  • Boundary too wiggly? โ†’ Increase L2 regularization or reduce network size
  • Boundary too smooth? โ†’ Decrease L2 regularization or add more neurons
  • Training seems stuck? โ†’ Reset and try again (random initialization matters!)

๐Ÿงช The Playground

Alright, you know what everything does โ€” now go play! Hit โ–ถ Train to start training, โธ Pause to freeze training so you can examine the boundary, and Reset Network to start fresh with new random weights (the data stays the same). Use Regenerate to create a new random set of data points for the current dataset type.

Pro tip: Pause the training periodically to study the decision boundary. How does it compare to what you'd draw by hand? Are there regions where the network is clearly wrong? What would help โ€” more neurons, more layers, or different settings?

Remember: Every time you hit "Reset Network," the weights are randomized again. This means two identical networks can learn differently! If training seems stuck or weird, just reset and try again โ€” sometimes you just got unlucky with the initial random weights.

Dataset

Architecture

Settings

0.05
0
0
Epoch
โ€”
Loss
โ€”
Parameters

Network

Loss

Choose a dataset, build your network, and hit Train!
๐Ÿงช Try This!

Not sure where to start? Here are some guided experiments that will teach you a ton. Work through them in order โ€” each one builds on the intuition from the previous one. Don't just read them โ€” actually do each step in the playground above!

โฑ๏ธ Time estimate: Each experiment takes 2โ€“5 minutes. The whole set takes about 20 minutes.

Experiment 1: The Power of Layers

  1. Select the Spiral dataset.
  2. Remove layers until you have just 1 hidden layer with 4 neurons.
  3. Hit Train. Watch it struggle โ€” the boundary can't twist enough to follow the spiral.
  4. Now reset, add a second layer (+ Layer button), and train again. Notice the improvement.
  5. Add a third layer. Now the network can trace the spiral beautifully!

Lesson: More layers let the network draw increasingly complex boundaries. But each layer adds training time and parameters.

Experiment 2: Learning Rate Gone Wrong

  1. Use the Circle dataset with 2 layers of 4 neurons each.
  2. Set the learning rate to 1.0 (drag the LR slider all the way right).
  3. Hit Train. Watch the loss curve โ€” it bounces wildly instead of going down smoothly!
  4. Reset. Now set LR to 0.001. Train. The loss drops... but painfully slowly.
  5. Reset. Try 0.05. Smooth, steady learning. Just right!

Lesson: Learning rate is one of the most important settings in machine learning. Too high and the network can't settle down โ€” it keeps overshooting the optimal weights. Too low and you'll be waiting all day. In practice, ML engineers often start with a moderate learning rate and decrease it over time (called a "learning rate schedule"), but for this playground, a fixed rate works fine.

Fun fact: In real-world deep learning, there are entire research papers dedicated just to figuring out better ways to set the learning rate. It's that important!

Experiment 3: Overfitting vs. Regularization

  1. Select the Gaussian dataset (two overlapping blobs).
  2. Use 3 layers with lots of neurons (like 8, 8, 8). Keep L2 at 0.
  3. Train for a while. Notice how the boundary gets really wiggly, trying to perfectly classify every single dot โ€” even the ones deep in the "wrong" territory.
  4. Now reset, set L2 to 0.005, and train again. The boundary is smoother and more sensible โ€” it accepts that some dots near the overlap are just hard to classify.

Lesson: A network that's too powerful will memorize noise. Regularization keeps it honest. This is one of the most important concepts in all of machine learning โ€” the balance between a model that's too simple (underfitting) and one that's too complex (overfitting).

Experiment 4: Activation Function Showdown

  1. Select the Checkerboard dataset with 3 layers (6, 6, 4).
  2. Set activation to ReLU. Train for ~2000 epochs. Pause and look at the boundary shape โ€” notice the sharp, angular edges.
  3. Reset. Switch to Sigmoid, same architecture. Train again. The boundary is rounder, smoother โ€” and it might learn slower.
  4. Reset. Try Tanh. Often the fastest to converge on this dataset.

Lesson: The activation function shapes the "vocabulary" of boundaries the network can draw. Different activations suit different problems.

Experiment 5: The Impossible Checkerboard

  1. Select the Checkerboard dataset.
  2. Start with 1 layer, 4 neurons. Train. It barely makes a dent!
  3. Try 3 layers (8, 6, 4) with ReLU. Train for a while. It should start to carve out the checkerboard pattern, though it's tough.
  4. This is one of the hardest patterns โ€” it shows why some problems need really deep networks.
  5. If you're feeling ambitious, try to get the loss below 0.05. It's possible but takes patience and the right architecture!

Lesson: Some patterns are genuinely hard, and no amount of clever settings can replace having enough network capacity. When your network can't learn, sometimes the answer is simply "make it bigger."

Experiment 6: The Minimalist Challenge

  1. Select the Circle dataset โ€” the easiest one.
  2. Use just 1 hidden layer with 2 neurons. Can it learn the circle? (Probably not well!)
  3. Try 1 layer with 3 neurons. Better?
  4. What's the smallest network that can perfectly classify the circle? Try to find it!

Lesson: In real ML, smaller models are better โ€” they're faster, use less memory, and generalize better. Finding the smallest network that solves your problem is an art. This principle is called Occam's Razor in machine learning โ€” prefer the simplest model that explains the data.

๐ŸŽฏ Create Your Own Challenge

Once you've done the guided experiments, try setting your own goals:

  • Can you solve Spiral with loss below 0.01? What's the smallest network that achieves it?
  • Can you find a learning rate that works for ALL datasets without changing it?
  • What happens if you set every layer to just 1 neuron? Can any dataset be learned?
  • Try training, pausing, changing the learning rate, then resuming. Does a "learning rate schedule" (starting high, going low) help?
  • What's the maximum number of parameters you can create? (Hint: 6 layers of 12 neurons each!) Does more always mean better?

๐Ÿ“ What You Just Learned

If you've worked through even a few of those experiments, you've already developed real intuition about neural networks. Let's crystallize the key takeaways:

๐Ÿ—บ๏ธ The Bigger Picture

Everything you've experimented with here โ€” choosing architectures, tuning learning rates, fighting overfitting โ€” is exactly what machine learning engineers do at companies like Google, OpenAI, and Meta. The only differences are scale (they use millions of neurons instead of dozens) and data (they train on massive real-world datasets instead of 300 colored dots). But the principles? Identical.

The fact that you can now look at a loss curve and say "learning rate is too high" or see a wiggly boundary and think "needs regularization" means you've developed real ML intuition. That's not something you can learn from reading a textbook โ€” it comes from doing, from playing, from failing and trying again. Which is exactly what you just did.

These aren't just toy concepts โ€” every single one of these settings exists in the real neural networks that power ChatGPT, image recognition, self-driving cars, and more. The networks are bigger (millions or billions of parameters instead of dozens), but the fundamental ideas are identical. You're learning the real thing.

You now have hands-on experience with the core concepts of neural networks: neurons, layers, activation functions, learning rates, and regularization. These are the building blocks of every neural network ever built โ€” from the tiny playground networks you just trained to the massive language models that power AI assistants.

In the next chapter, we'll jump from numbers to words โ€” and see how neural networks can understand language. How do you turn a sentence into numbers that a network can process? How does a neural network learn the meaning of words? ๐Ÿš€