This is a neuron.
The smallest building block of artificial intelligence.
The smallest building block of artificial intelligence.
Every piece of information โ a pixel, a word, a sound wave โ gets converted to a number. The neuron receives these numbers as inputs.
Weights are how the neuron decides what matters. A high weight means "pay attention to this input." A low weight means "ignore it."
That weighted sum could be any number. The neuron runs it through a sigmoid curve โ squashing it to a value between 0 and 1. Like a confidence score.
But stack millions together, and they can learn to read, write, create art, and have conversations.
Every connection you see is a weight. Every dot is a neuron. ChatGPT has 1.8 trillion of these.
Eight interactive chapters. From a single neuron to understanding how GPT works.
This is a real language model โ it learns patterns in text and predicts what comes next. GPT works on the same principle, but uses a neural network with billions of parameters instead of simple pattern counting. The core idea is identical: given some text, what character is most likely to come next?
Long before ChatGPT, people built text generators using a much simpler idea: just count what usually comes next.
A Markov chain reads through a bunch of text and builds a lookup table. For every sequence of characters it sees, it records which character followed. Then to generate text, it just keeps looking up "given these last few characters, what's the most likely next one?" and picking randomly from the options.
Say you train on the text "the cat sat on the mat" with a context window of 3 characters:
"the"' ''c''a''t'That's it. No learning, no weights, no neurons. Just a big table of "what usually comes next." It's fast and sometimes produces surprisingly readable text โ but it has no understanding whatsoever.
The jump from Markov chains to neural networks is the jump from "pattern matching" to something closer to "understanding." Both predict the next token โ but the neural network builds an internal model of language itself.
Type below and watch a Markov chain (not a neural network) try to continue your text. Notice how it often starts strong but quickly drifts into nonsense โ it has no memory beyond the last few characters.
Notice how the Markov chain can produce surprisingly readable text โ it's great at short phrases because it memorizes exact sequences. But try a longer prompt or switch presets: it quickly drifts into nonsense because it has zero understanding. The LSTM above isn't much better at this tiny scale, but the same architecture scaled to billions of parameters becomes GPT.