Watching a neural network learn
· ml, interactive
Every explanation of neural networks eventually shows you the same picture: a decision boundary neatly wrapped around some data. What that picture never shows is the part I find most beautiful — the process. The thousands of tiny gradient steps between “random noise” and “it gets it.”
So instead of a picture, here is the process itself. The network below is not a video or a simulation recording — it is a real multilayer perceptron (2 → 24 → 24 → 1, tanh activations) training live in your browser as you scroll. Around 200 lines of dependency-free JavaScript.
300 points, two intertwined spirals — blue and orange. This is everything the network will ever see. No equations, no hints about the shape. Scroll on.
Meet the network: two inputs, two hidden layers of 24 neurons, one output. Freshly initialized with random weights, its “decision boundary” is confident nonsense.
Gradient descent begins. A couple hundred epochs in, the network is still mostly guessing — watch the accuracy in the corner hover barely above a coin flip. Spirals are hard.
Then, quite suddenly, it clicks. Somewhere past epoch ~400 the boundary starts to curl, and the loss falls off a cliff. Most learning curves have a moment like this.
A few thousand epochs and the boundary hugs both arms almost perfectly. Everything you just watched was real gradient descent running in your browser — no video, no library, about 200 lines of JavaScript.
Your turn. Pick a color below the canvas and click (or tap) to drop new points — the network never stops training, so watch it bend the boundary around your edits in real time.
A few things worth noticing about what you just watched:
-
The long boring plateau is real. For hundreds of epochs the network barely improves — then the loss collapses. Real training runs, at any scale, are full of these phase transitions, and staring at loss curves without context can be deeply misleading.
-
The boundary is smooth because the network is small. Two hidden layers of 24 neurons can’t memorize noise — they’re forced to find the spiral structure. Make the network much bigger and it would happily carve ugly little islands around individual points instead.
-
Your points at the end were a live distribution shift. When you dropped new points into enemy territory, you watched the model trade off old knowledge against new evidence in real time — the same tension behind catastrophic forgetting in much larger systems.
This post is mostly a proof of format: prose and a live system sharing the same page, in the spirit of the interactive essays at pudding.cool and ai-2027.com that I admire. The plumbing is now part of this site, so future posts can put a real model, algorithm, or dataset under your scroll wheel with a one-line import. If there’s something you’d like to watch happen rather than read about — tell me.