This is a neuron.

The smallest building block of artificial intelligence.

scroll to explore

↓

It takes numbers in.

Every piece of information — a pixel, a word, a sound wave — gets converted to a number. The neuron receives these numbers as inputs.

It multiplies by weights.

Weights are how the neuron decides what matters. A high weight means "pay attention to this input." A low weight means "ignore it."

Then it squishes the result.

That weighted sum could be any number. The neuron runs it through a sigmoid curve — squashing it to a value between 0 and 1. Like a confidence score.

Weighted Sum

0.0

→

σ (sigmoid)

→

Output
0.5

One neuron is simple.

But stack millions together, and they can learn to read, write, create art, and have conversations.

Every connection you see is a weight. Every dot is a neuron. ChatGPT has 1.8 trillion of these.

Go deeper.

Eight interactive chapters. From a single neuron to understanding how GPT works.

01 What IS a Neural Network? Build your first neuron 02 Neurons That Learn Watch gradient descent in action 03 Layers of Thinking Solve problems one neuron can't 04 The Playground Full neural network sandbox 05 How Computers Read Tokens, embeddings, vectors 06 Paying Attention The key insight behind ChatGPT 07 The Transformer Putting it all together 08 From Tiny to GPT How small becomes powerful

Train your own.

This is a real language model — it learns patterns in text and predicts what comes next. GPT works on the same principle, but uses a neural network with billions of parameters instead of simple pattern counting. The core idea is identical: given some text, what character is most likely to come next?

① Choose training data

Preview:

② Configure

Context Window (n-gram order)

Temperature 0.8

④ Generate text

🔍 What makes LLMs different from older ways we achieved text completion?

Before neural networks: Markov chains

Long before ChatGPT, people built text generators using a much simpler idea: just count what usually comes next.

A Markov chain reads through a bunch of text and builds a lookup table. For every sequence of characters it sees, it records which character followed. Then to generate text, it just keeps looking up "given these last few characters, what's the most likely next one?" and picking randomly from the options.

How it works: a simple example

Say you train on the text "the cat sat on the mat" with a context window of 3 characters:

LOOKUP TABLE
"the" → ' ' (space) 66%, 'm' 33%
"he " → 'c' 50%, 'm' 50%
"cat" → ' ' 100%
"at " → 's' 50%, 'o' 50%
"sat" → ' ' 100%
...and so on

GENERATION

Start with: "the"

→ Look up "the": pick ' '

→ Look up "he ": pick 'c'

→ Look up "e c": pick 'a'

→ Look up " ca": pick 't'

→ Result: "the cat..."

That's it. No learning, no weights, no neurons. Just a big table of "what usually comes next." It's fast and sometimes produces surprisingly readable text — but it has no understanding whatsoever.

Why this breaks down

🔴 Markov chain

Can only look back a few characters
No understanding of meaning
"The president of the United ___" → could say anything
Can't maintain a topic across a sentence

🟢 Neural network (LSTM / Transformer)

Considers the entire context
Learns patterns, grammar, meaning
"The president of the United ___" → "States"
Can stay on topic for paragraphs

The jump from Markov chains to neural networks is the jump from "pattern matching" to something closer to "understanding." Both predict the next token — but the neural network builds an internal model of language itself.

Try it: Markov chain completion

Type below and watch a Markov chain (not a neural network) try to continue your text. Notice how it often starts strong but quickly drifts into nonsense — it has no memory beyond the last few characters.

Notice how the Markov chain can produce surprisingly readable text — it's great at short phrases because it memorizes exact sequences. But try a longer prompt or switch presets: it quickly drifts into nonsense because it has zero understanding. The LSTM above isn't much better at this tiny scale, but the same architecture scaled to billions of parameters becomes GPT.