Loan Data From Prosper
What affects a borrower’s APR or interest rate? That is the question to wit millions of borrowers seek. A small yet important attempt at precisely this question is made using data from the p2p lending firm Prosper.
The Project
The data set contains 113,937 loans with 81 variables on each loan, including loan amount, borrower rate (or interest rate), current loan status, borrower income, and many others. A data dictionary is also available, explaining variables in the data set. In conjunction with domain knowledge, the data dictionary provides a basis from which a subset of variables is chosen for further exploration.
The analysis was conducted using Jupyter Notebook running on a Python kernel.
What We Learned
Understanding univariate exploration with histograms and bar charts
Using a log transformation to make trends in the data visible
Adjusting axis limits to focus on the bulk of a distribution
Bivariate exploration using correlation and scatter plots to understand linear relationships between quantitative variables
Utilising box plots to understand the distribution of qualitative variables
Multivariate exploration with shape, size and colour encodings
The Code and the Report
GitHub repository for the data and the Jupyter Notebook
the PDF report can also be found here
References
Exporting to a CSV: https://stackoverflow.com/questions/22872952/set-file-path-for-to-csv-in-pandas
Remove legend from the plot: https://stackoverflow.com/questions/5735208/remove-the-legend-on-a-matplotlib-figure
Seaborn Colour Palettes: https://seaborn.pydata.org/tutorial/color_palettes.html