A basic implementation of imitation learning using Stable Baselines 2.10
Objective: Benchmark reinforcement learning (RL) and imitation Learning (GAIL) algorithms from Stable Baselines 2.10 on OpenAI Gym and AirSim environments. To be more specific, the goal of this codebase is to:
Idea: Pick your favourite [task, RL algo] pair -> train RL -> rollout expert data -> train GAIL -> verify imitation
Framework, langauge, OS: Tensorflow 1.14, Python 3.7, Windows 10
Thesis problem statement: Imitate Autonomous UAV maneuver and landing purely from human demonstrations. We train GAIL on a custom environment built on Microsoft AirSim 2.0. Short video here
The implementation uses Stable Baselines 2.10. Inlcuded ‘utils.py’ from here to save hyperparameters as a Dict object
# create virtual environment (optional)
conda create -n myenv python=3.7
conda activate myenv
git clone https://github.com/prabhasak/masters-thesis.git
cd masters-thesis
pip install -r requirements.txt # recommended
pip install stable-baselines[mpi] # MPI needed for TRPO, GAIL
For CustomEnvs and CustomAlgos: @apoddar573/making-your-own-custom-environment-in-gym-c3b65ff8cdaa">Register your CustomEnv on Gym (examples), and add your custom env and/or algorithm details to the code. You can use the "airsim_env"
folder for reference
For AirSim: Some resources to generate custom binary files, modify settings. Binaries for my thesis available here. You will have to run them before running the code
python train.py --seed 42 --env Pendulum-v0 --algo sac -rl -trl 1e5 -il -til 3e5 -best -check -eval -tb -params-RL learning_starts:1000 -params-IL lam:0.9 vf_iters:10
Exclude -rl
if expert data is available. For deterministic evaluation of expert data, add deterministic=True
here. Tuned hyperparameters (HPs) are available on Baselines Zoo. Please read description.txt
for info on sub-folders
Check expert data: python expert_data_view.py --seed 42 --env Pendulum-v0 --algo sac --episodic
\
If --episodic
, use ‘c’ to go through each episode, and ‘q’ to stop the program
Render expert data: python expert_data_render.py --seed 42 --env My-Pendulum-v0 --algo sac --render
\
For envs in “custom_env” folder. If --episodic
, use ‘c’ to go through each episode, and ‘q’ to stop the program
Evaluate, render model: python model_render.py --seed 42 --env Pendulum-v0 --algo sac --mode rl -policy
\
Verify optimality of trained RL model and imitation accuracy of trained GAIL model. Include --test
to render
The codebase contains Tensorboard and Callback features, which help monitor performance during training. You can enable them with -tb
and -check,-eval
respectively. TB: tensorboard --logdir "/your/file/path"
. Callbacks for:
-best
)This is a work in progress (available here), but I hope to release clean code once my reasearch is done!