Teaching a CNN to read handwriting

The aims

The project began with a deliberately narrow set of goals:

01Build a convolutional model that could be trained on a database of images.
02Process and prepare a database of handwritten-character images to train it on.
03Train it effectively enough to recognise characters with high accuracy.
04Point the finished model at a single image file and have it read the character.

The dataset

Almost everything about a model like this comes down to the data, so that is where most of the work went. I trained on the EMNIST ByClass dataset: over 700,000 images spanning 62 classes of handwritten characters (digits, plus upper- and lower-case letters), each one paired with its label.

Every image is a 28×28 grid of single-channel pixels, representing a transposed image of a letter. These matrices were small enough to train on quickly, but detailed enough to tell an a from a q.

The input format

The model never sees a letter. It sees a 28×28 field of brightness values, one channel deep: 784 numbers to find structure in.

now reading: —

The training method

A model that only recognises perfectly centred, upright characters is useless on real handwriting. So I augmented the data, randomly rotating and translating the training images, so the model saw variation and learned the shape of a character rather than memorising one exact rendering of it.

To keep it from overfitting, I used learning-rate scheduling (easing the step size down as training progressed) together with early stopping (halting before the model started fitting noise in the training set instead of the signal). Together they kept it general and prevented overfitting, meaning that high scores could be achieved on completely unseen data.

95%+

Recognition successon unseen, augmented characters

Was it a success?

I reached an average recognition rate of 95%+ on unseen, augmented data. On characters I drew and fed in myself, it was effectively perfect. I count that a success: the original aim was to point it at an image and have it read the character, and it does.

About the missesMost of what it gets wrong are pairs that are ambiguous without context: K vs k, 0 vs O, 1 vs l. On some of those out-of-context cases, like C vs c, the model could differentiate more reliably than I could.

Six EMNIST handwriting samples with the model's prediction and the true label in parentheses, all correct — Inference on held-out EMNIST samples. Each label reads predicted (true). These six all match.

What's next

This model learns from a fixed dataset. It is shown the right answer thousands of times until the patterns stick. The next one learns from its own play. I've started building a model on the matrix logic of 2048 using reinforcement learning: it plays, scores, and adjusts, with no labelled answer to copy, only the consequence of the move it just made.