Wrangle and Analyse Data
The Project
This project focused on wrangling data from the WeRateDogs Twitter account. The Twitter account has over 4 million followers and has received international media coverage.
The archival dataset contains basic tweet data (tweet ID, timestamp, text, etc.) for all 5000+ of their tweets as they stood on August 1, 2017. This is augmented with a second dataset with predictions of dog breeds for each of the Tweets using a neural network. Finally, a Twitter API is utilised to glean further information such as favourites and retweets.
The analysis is conducted using Jupyter Notebook, running on a Python kernel.
What We Learned
How to programmatically download files using the requests library
How to sign up for and use an API
How to use the tweepy library to connect to the Twitter API
How to handle JSON files
How to manually assess and programmatically assess datasets
How to define Quality and Tidiness issues
How to structure a report to document, define, and test the data cleansing process
The Code and the Report
GitHub repository for the data and the Jupyter Notebook
the PDF report can also be found here