Watching a neural network learn

· ml, interactive

Every explanation of neural networks eventually shows you the same picture: a decision boundary neatly wrapped around some data. What that picture never shows is the part I find most beautiful — the process. The thousands of tiny gradient steps between “random noise” and “it gets it.”

So instead of a picture, here is the process itself. The network below is not a video or a simulation recording — it is a real multilayer perceptron (2 → 24 → 24 → 1, tanh activations) training live in your browser as you scroll. Around 200 lines of dependency-free JavaScript.

epoch 0· loss · accuracy

300 points, two intertwined spirals — blue and orange. This is everything the network will ever see. No equations, no hints about the shape. Scroll on.

Meet the network: two inputs, two hidden layers of 24 neurons, one output. Freshly initialized with random weights, its “decision boundary” is confident nonsense.

Gradient descent begins. A couple hundred epochs in, the network is still mostly guessing — watch the accuracy in the corner hover barely above a coin flip. Spirals are hard.

Then, quite suddenly, it clicks. Somewhere past epoch ~400 the boundary starts to curl, and the loss falls off a cliff. Most learning curves have a moment like this.

A few thousand epochs and the boundary hugs both arms almost perfectly. Everything you just watched was real gradient descent running in your browser — no video, no library, about 200 lines of JavaScript.

Your turn. Pick a color below the canvas and click (or tap) to drop new points — the network never stops training, so watch it bend the boundary around your edits in real time.

A few things worth noticing about what you just watched:

  1. The long boring plateau is real. For hundreds of epochs the network barely improves — then the loss collapses. Real training runs, at any scale, are full of these phase transitions, and staring at loss curves without context can be deeply misleading.

  2. The boundary is smooth because the network is small. Two hidden layers of 24 neurons can’t memorize noise — they’re forced to find the spiral structure. Make the network much bigger and it would happily carve ugly little islands around individual points instead.

  3. Your points at the end were a live distribution shift. When you dropped new points into enemy territory, you watched the model trade off old knowledge against new evidence in real time — the same tension behind catastrophic forgetting in much larger systems.

This post is mostly a proof of format: prose and a live system sharing the same page, in the spirit of the interactive essays at pudding.cool and ai-2027.com that I admire. The plumbing is now part of this site, so future posts can put a real model, algorithm, or dataset under your scroll wheel with a one-line import. If there’s something you’d like to watch happen rather than read about — tell me.