Build any network you want. Choose a dataset, tweak the architecture, and watch it learn in real time.
This is where everything from the previous chapters comes together. The playground below is a sandbox โ a place where you can experiment freely, break things, and learn by doing. There are no wrong answers here. Try wild combinations! Make the network fail spectacularly! That's how you build intuition.
๐ฏ Your goal: By the end of this chapter, you should be able to pick a dataset, design a network architecture, choose sensible settings, and successfully train the network to classify the data. If you can do that for the Spiral dataset, you're officially doing machine learning!
Real machine learning engineers spend a huge amount of time doing exactly this โ tweaking settings, running experiments, watching what happens, and building an instinct for what works. This playground gives you that same experience, compressed into a few minutes instead of a few years. Every great ML practitioner started by playing.
Before you start clicking, let's take a tour of what you're looking at. The playground has three panels, each showing you something different about the network:
This is your control center. You'll choose what kind of data the network should learn, how big the network should be, and how it should learn. Think of it as the "settings" for your experiment. Every option is explained in detail below โ don't worry if the names sound intimidating, they're simpler than they sound.
The left panel has four cards: Dataset (pick your puzzle), Architecture (design the network shape), Settings (fine-tune learning behavior), and Controls (play/pause/reset). You'll also see live stats at the bottom showing the current epoch (training step), loss (how wrong the network is), and parameter count (total number of knobs the network is adjusting).
This is the main event! The large colorful canvas shows you what the network "sees." Here's how to read it:
As the network trains, watch the colors shift and flow like liquid โ that's the network literally changing its mind about how to classify different regions of space. Early on, the boundary is usually a simple line or blob. As training progresses, it gets more detailed, curving around individual data points. This is the network learning!
If you see the boundary getting too complex โ curving wildly between individual dots โ that's a sign of overfitting (the network memorizing instead of learning). We'll talk about how to fix that with regularization below.
The top card shows a diagram of your network โ circles are neurons, lines are connections (weights). The thicker or more colorful a connection, the stronger that weight is. When you change the architecture (add layers, adjust neurons), this diagram updates instantly so you can see what you're building.
Below the network diagram is the loss curve โ this is your progress tracker. The x-axis is time (training epochs) and the y-axis is how wrong the network is. You want this line to go down and stay down. When it flattens near zero, your network has learned the pattern. If it bounces around instead of going down, your learning rate is probably too high. If it barely moves, your learning rate is too low or your network is too small for the problem.
Below both of those is a narration box that gives you a plain-English description of what's happening during training. It'll tell you when the network is just guessing, when it's making progress, and when it's nailed the dataset. Think of it as your personal ML coach, commenting on your network's performance in real time.
Together, these three panels give you a complete picture: what you configured (left), what the network learned (center), and how it got there (right). This is essentially the same monitoring setup that real ML engineers use, just simplified and visual.
Let's go through each option so you know exactly what you're tweaking. Understanding these settings is the difference between "randomly clicking buttons" and "actually doing machine learning." Each one corresponds to a real decision that ML engineers make every day.
Don't try to memorize all of this before playing โ read through it once, then come back and re-read specific sections as you experiment. Learning by doing is the whole point of this chapter.
The dataset is the shape of data the network needs to learn to separate. Think of it as the "puzzle" you're giving the network. Each dataset places dots on the canvas in a different pattern โ some dots are one color (class 0) and others are a different color (class 1). The network's job is to figure out the rule that separates the two groups and draw a boundary between them.
The datasets are ordered roughly by difficulty. Start with Circle to build confidence, then work your way up to the harder ones:
Difficulty ranking: Circle (โญ) โ Gaussian (โญโญ) โ XOR (โญโญโญ) โ Spiral (โญโญโญโญ) โ Checkerboard (โญโญโญโญโญ). If you can get a network to nail the Checkerboard with low loss, you're doing great!
Each time you select a new dataset (or click Regenerate), 300 random data points are placed on the canvas. The randomness means every run is slightly different โ which is actually realistic! In real ML, your training data is always a finite, imperfect sample of the true pattern.
This is arguably the most important setting โ it controls the shape of your brain. Specifically, it determines how many hidden layers your network has and how many neurons are in each one.
Use the "+ Layer" and "- Layer" buttons to add or remove layers (up to 6 layers max), and the sliders to set how many neurons each layer has (from 1 to 12). Each layer is labeled L1, L2, L3, etc.
Why does this matter? A single neuron can only draw a straight line. A single layer of neurons can draw multiple straight lines and combine them into simple shapes. But for complex patterns like spirals or checkerboards, you need multiple layers that build on each other โ each layer detecting more abstract features than the last.
Think of it like this: each layer is a level of thinking. One layer can detect simple features ("is the dot on the left or right?"). Two layers can combine those features ("is the dot in the top-left quadrant?"). Three layers can detect even more abstract patterns ("is the dot on this spiral arm?").
More layers = can learn more complex patterns, but takes longer to train and needs more data. More neurons per layer = more nuance at each thinking level. A common beginner mistake is making the network way too big for a simple problem โ start small and add complexity only when needed.
Rule of thumb: Start with 1โ2 layers of 4 neurons each. If the network can't learn the pattern after training, add more. If it learns too fast and the boundary looks wiggly and overfit, you might have too many neurons โ try reducing or adding regularization.
If architecture is the most important structural choice, learning rate is the most important training choice. It controls how big each weight adjustment step is when the network learns.
Imagine you're walking toward a target in a dark room, and after each step someone tells you which direction to go:
Recommended starting point: Try 0.05. This works well for most datasets in this playground. Once you're comfortable, experiment with extremes to see what happens โ that's the whole point!
Try cranking the learning rate to 1.0 and watch the loss curve bounce around wildly. Then try 0.001 and watch it barely move. Finding the sweet spot is a real skill in machine learning โ in industry, people often try dozens of learning rates to find the best one for their specific problem.
The activation function is the "squishing function" each neuron uses to decide its output. Without an activation function, the network would just be doing simple linear math โ it could only draw straight-line boundaries no matter how many layers you add. The activation function is what gives neural networks their power to learn curves, spirals, and complex shapes.
You met these in earlier chapters, but here's a refresher:
Quick comparison tip: Try the Checkerboard dataset with each activation and the same architecture. ReLU tends to create blocky, grid-like boundaries (which actually suits checkerboard well!). Sigmoid creates smoother, rounder regions.
L2 regularization is a technique that prevents the network from memorizing the training data too exactly. It's like telling a student: "Learn the concepts, not the specific answers to the practice test."
Without regularization, a powerful network might create a super-complex, wiggly boundary that perfectly fits every single training dot โ including the noisy or misleading ones. This is called overfitting, and it's one of the biggest problems in machine learning. A network that overfits has essentially memorized the answers instead of learning the pattern โ like a student who memorizes the answer key but can't solve a new problem.
L2 regularization adds a small penalty for having large weights, which encourages the network to keep things simpler. Mathematically, it adds the sum of all squared weights (multiplied by a small number) to the loss. The network is now trying to minimize two things at once: the prediction error AND the size of its weights. This forces it to find the simplest boundary that still works.
The sweet spot depends on your network size and dataset complexity. A big network on a simple dataset needs more regularization. A small network on a hard dataset might not need any โ it's already constrained by its size. Experiment and watch the boundary!
Alright, you know what everything does โ now go play! Hit โถ Train to start training, โธ Pause to freeze training so you can examine the boundary, and Reset Network to start fresh with new random weights (the data stays the same). Use Regenerate to create a new random set of data points for the current dataset type.
Pro tip: Pause the training periodically to study the decision boundary. How does it compare to what you'd draw by hand? Are there regions where the network is clearly wrong? What would help โ more neurons, more layers, or different settings?
Remember: Every time you hit "Reset Network," the weights are randomized again. This means two identical networks can learn differently! If training seems stuck or weird, just reset and try again โ sometimes you just got unlucky with the initial random weights.
Not sure where to start? Here are some guided experiments that will teach you a ton. Work through them in order โ each one builds on the intuition from the previous one. Don't just read them โ actually do each step in the playground above!
โฑ๏ธ Time estimate: Each experiment takes 2โ5 minutes. The whole set takes about 20 minutes.
Lesson: More layers let the network draw increasingly complex boundaries. But each layer adds training time and parameters.
Lesson: Learning rate is one of the most important settings in machine learning. Too high and the network can't settle down โ it keeps overshooting the optimal weights. Too low and you'll be waiting all day. In practice, ML engineers often start with a moderate learning rate and decrease it over time (called a "learning rate schedule"), but for this playground, a fixed rate works fine.
Fun fact: In real-world deep learning, there are entire research papers dedicated just to figuring out better ways to set the learning rate. It's that important!
Lesson: A network that's too powerful will memorize noise. Regularization keeps it honest. This is one of the most important concepts in all of machine learning โ the balance between a model that's too simple (underfitting) and one that's too complex (overfitting).
Lesson: The activation function shapes the "vocabulary" of boundaries the network can draw. Different activations suit different problems.
Lesson: Some patterns are genuinely hard, and no amount of clever settings can replace having enough network capacity. When your network can't learn, sometimes the answer is simply "make it bigger."
Lesson: In real ML, smaller models are better โ they're faster, use less memory, and generalize better. Finding the smallest network that solves your problem is an art. This principle is called Occam's Razor in machine learning โ prefer the simplest model that explains the data.
Once you've done the guided experiments, try setting your own goals:
If you've worked through even a few of those experiments, you've already developed real intuition about neural networks. Let's crystallize the key takeaways:
Everything you've experimented with here โ choosing architectures, tuning learning rates, fighting overfitting โ is exactly what machine learning engineers do at companies like Google, OpenAI, and Meta. The only differences are scale (they use millions of neurons instead of dozens) and data (they train on massive real-world datasets instead of 300 colored dots). But the principles? Identical.
The fact that you can now look at a loss curve and say "learning rate is too high" or see a wiggly boundary and think "needs regularization" means you've developed real ML intuition. That's not something you can learn from reading a textbook โ it comes from doing, from playing, from failing and trying again. Which is exactly what you just did.
These aren't just toy concepts โ every single one of these settings exists in the real neural networks that power ChatGPT, image recognition, self-driving cars, and more. The networks are bigger (millions or billions of parameters instead of dozens), but the fundamental ideas are identical. You're learning the real thing.
You now have hands-on experience with the core concepts of neural networks: neurons, layers, activation functions, learning rates, and regularization. These are the building blocks of every neural network ever built โ from the tiny playground networks you just trained to the massive language models that power AI assistants.
In the next chapter, we'll jump from numbers to words โ and see how neural networks can understand language. How do you turn a sentence into numbers that a network can process? How does a neural network learn the meaning of words? ๐