A PyTorch implementation of the paper Show, Attend and Tell: Neural Image Caption Generation with Visual Attention