Jupyter Notebook: Project Code
The @WeRateDogs’s Twitter archive (for tweets posted through August 1st, 2017) contained basic Tweet data (Tweet ID, timestamp, text, etc.) Udacity provided an “enhanced” csv-file containing columns extracted programatically including the rating numerator, the rating denominator, the dog’s name, and the dog stages (doggo, floofer, pupper, and puppo). These columns also needed to be assessed and cleaned as the extraction process was not without challenges due to how the data set was structured and how some of the data was inputted into this csv file.
Finally, the @WeRateDogs archive also lacked some useful information including the retweet count and the favorite count. Using Tweet IDs to query the Twitter API for each Tweet’s JSON data was handled using Python’s Tweepy library. Storing each Tweet’s JSON data and then reading the text file line-by-line using Pandas DataFrame parsed this data subset down to only the desired variables of the retweet count and the favorite count. This information was useful for various ranking purposes.