Compute-efficient reinforcement learning with binary neural networks and evolution strategies.
Read the writeup on my website!
Binary neural networks have a low memory footprint and run crazy fast.
Let’s use that to speed up reinforcement learning.
For example, this network is 600 bytes and performs 500,000 evaluations/second on my laptop CPU:
The goal of this project is to train binary neural networks directly in a reinforcement learning environment using
natural evolution strategies.
Binary networks are a great fit for RL because:
The networks implemented here are a modification of XNOR-Nets.
Each layer’s weights, inputs, and outputs are constrained to being vectors of +/-1 values, which
are encoded as binary, and use the sign function as their nonlinearity. Layers compute the function%20%3D%20%5Ctext%7Bsign%7D(W%5ETx%20%2B%20b)).
In exchange, these networks have an extremely fast forward pass because the dot product of binary-encoded +/-1
binary vectors x
and y
is n_bits - popcount(x XOR y)
, which can be computed in just a few clock cycles.
By baking the subtraction and the bias into the comparison for the sign function, we can speed up inference even more.
Each activation / weight vector is stored as a uint64_t
, so memory access is very fast and usage is extremely low.
Evolution strategies
work by maintaining a search distribution over neural networks.
At each step, we sample a new population of networks from the distribution, evaluate each, then
update the search distribution towards the highest-performing samples.
Natural evolution strategies do this by following the
natural gradient to update
the search distribution in the direction of highest expected reward.
To train binary networks with NES, I use a separable Bernoulli distribution, parameterized by
a vector of logits. Each generation’s update computes the closed-form natural gradient with respect to the
bit probabilities, and then backpropagates through the sigmoid function to update the logits.
This project has a fairly large number of heterogenous dependencies, making building somewhat of a pain.
There is a Poetry build hook which should build the whole project by just running poetry install
.
The full build chain is:
nvcc
.pyx
files into C++ with Cython (which wrap the CUDA code in a Python-accessible way).cpp
files.Executing poetry run python genome/demo.py
will run a demo that trains a small neural network to balance a pole.
If you’re on a graphical system, it should render episodes periodically as the model learns.
It also dumps logs in outputs
, which you can inspect with tensorboard --logdir=outputs
to watch training metrics evolve.
Every run saves its own logs, so they require unique names. If you want to run the same command twice, you can
delete the old log files under that name, or pick a new name.