Reinforcement learning algorithm that blends the N-th order Markov property with abstract MDPs, PPO, and a hybrid model-free/model-based approach.