Implementing CNN code in CUDA and OpenCL to evaluate its performance on NVIDIA GPUs, AMD GPUs, and an FPGA platform.