Wrangle and Analyse Data

The Project

This project focused on wrangling data from the WeRateDogs Twitter account. The Twitter account has over 4 million followers and has received international media coverage.

The archival dataset contains basic tweet data (tweet ID, timestamp, text, etc.) for all 5000+ of their tweets as they stood on August 1, 2017. This is augmented with a second dataset with predictions of dog breeds for each of the Tweets using a neural network. Finally, a Twitter API is utilised to glean further information such as favourites and retweets.

The analysis is conducted using Jupyter Notebook, running on a Python kernel.

What We Learned

  • How to programmatically download files using the requests library

  • How to sign up for and use an API

  • How to use the tweepy library to connect to the Twitter API

  • How to handle JSON files

  • How to manually assess and programmatically assess datasets

  • How to define Quality and Tidiness issues

  • How to structure a report to document, define, and test the data cleansing process

The Code and the Report

References

Previous
Previous

Retail Sales Data

Next
Next

Loan Data From Prosper