Recurrent Attention Model for MNIST classification
For the first few epochs: the network performes like this: