Python implementation and analysis of two reinforcement algorithms – monte carlo and temporal differencing