Learning from demonstrations: An Imitation Learning benchmark for Stable Baselines 2.10

Objective: Benchmark reinforcement learning (RL) and imitation Learning (GAIL) algorithms from Stable Baselines 2.10 on OpenAI Gym and AirSim environments. To be more specific, the goal of this codebase is to:

Train a GAIL model to imitate expert demonstrations generated from a trained RL model
Integrate several cool features provided by Stable Baselines (to the best of my knowledge, uncharted territory!)

Idea: Pick your favourite [task, RL algo] pair -> train RL -> rollout expert data -> train GAIL -> verify imitation

Framework, langauge, OS: Tensorflow 1.14, Python 3.7, Windows 10

Thesis problem statement: Imitate Autonomous UAV maneuver and landing purely from human demonstrations. We train GAIL on a custom environment built on Microsoft AirSim 2.0. Short video here

Prerequisites

The implementation uses Stable Baselines 2.10. Inlcuded ‘utils.py’ from here to save hyperparameters as a Dict object

# create virtual environment (optional)
conda create -n myenv python=3.7
conda activate myenv
git clone https://github.com/prabhasak/masters-thesis.git
cd masters-thesis
pip install -r requirements.txt # recommended
pip install stable-baselines[mpi] # MPI needed for TRPO, GAIL

For CustomEnvs and CustomAlgos: @apoddar573/making-your-own-custom-environment-in-gym-c3b65ff8cdaa">Register your CustomEnv on Gym (examples), and add your custom env and/or algorithm details to the code. You can use the "airsim_env" folder for reference

For AirSim: Some resources to generate custom binary files, modify settings. Binaries for my thesis available here. You will have to run them before running the code

Usage

Train RL and GAIL: python train.py --seed 42 --env Pendulum-v0 --algo sac -rl -trl 1e5 -il -til 3e5 -best -check -eval -tb -params-RL learning_starts:1000 -params-IL lam:0.9 vf_iters:10

Exclude -rl if expert data is available. For deterministic evaluation of expert data, add deterministic=True here. Tuned hyperparameters (HPs) are available on Baselines Zoo. Please read description.txt for info on sub-folders

Check expert data: python expert_data_view.py --seed 42 --env Pendulum-v0 --algo sac --episodic\
If --episodic, use ‘c’ to go through each episode, and ‘q’ to stop the program
Render expert data: python expert_data_render.py --seed 42 --env My-Pendulum-v0 --algo sac --render\
For envs in “custom_env” folder. If --episodic, use ‘c’ to go through each episode, and ‘q’ to stop the program
Evaluate, render model: python model_render.py --seed 42 --env Pendulum-v0 --algo sac --mode rl -policy\
Verify optimality of trained RL model and imitation accuracy of trained GAIL model. Include --test to render

Features

The codebase contains Tensorboard and Callback features, which help monitor performance during training. You can enable them with -tb and -check,-eval respectively. TB: tensorboard --logdir "/your/file/path". Callbacks for:

Saving the model periodically (useful for continual learning and to resume training)
Evaluating the model periodically and saves the best model throughout training (you can choose to save and evaluate just the best model found throughout training with -best)

Future Work

Multiprocessing: speed up training (observed 6x speedup for CartPole-v0 on my CPU with 12 threads)
HP tuning: find the best set of hyperparameters for an [env, algo] pair
VecNormalize: normalize env observation, action spaces (useful for MuJoCo environments)
Monitor: record internal state information during training (episode length, rewards)
(i) Comparing consecutive runs of the experiment, and (ii) passing arguments, HPs to custom environments

This is a work in progress (available here), but I hope to release clean code once my reasearch is done!