Wine Quality Dataset
The Project
Part of the Udacity Data Analysis Nanodegree, this case study is a gentle introduction to working within the Jupyter notebook environment. The dataset itself contains multiple dimensions along which the quality of a wine is rated. The task is to determine which factors contribute to highly rated wines.
What We Learned
Using the plot function to build histograms
Using the plot function to build a scatter plot
Changing the figsize of a chart to a more readable format and adding a ‘;’ to the end of the line to remove unwanted text
Appending data frames together in Pandas
Renaming data frame Columns in Pandas
Using GroupBy and Query in Pandas to aggregate and group selections of data
Creating bar charts using the popular packages matplotlib and seaborn
Adding labels, titles , and colour to visualisations
Engineering proportionality to make relative comparisons
The Code and the Report
GitHub repository for the data
the report in a Jupyter Notebook can be found here
References
UCI Wine Quality Data Set: https://archive.ics.uci.edu/ml/datasets/Wine+Quality
Plot Documentation: https://matplotlib.org/users/pyplot_tutorial.html