Loan Data From Prosper

What affects a borrower’s APR or interest rate? That is the question to wit millions of borrowers seek. A small yet important attempt at precisely this question is made using data from the p2p lending firm Prosper.

The Project

The data set contains 113,937 loans with 81 variables on each loan, including loan amount, borrower rate (or interest rate), current loan status, borrower income, and many others. A data dictionary is also available, explaining variables in the data set. In conjunction with domain knowledge, the data dictionary provides a basis from which a subset of variables is chosen for further exploration.

The analysis was conducted using Jupyter Notebook running on a Python kernel.

What We Learned

  • Understanding univariate exploration with histograms and bar charts

  • Using a log transformation to make trends in the data visible

  • Adjusting axis limits to focus on the bulk of a distribution

  • Bivariate exploration using correlation and scatter plots to understand linear relationships between quantitative variables

  • Utilising box plots to understand the distribution of qualitative variables

  • Multivariate exploration with shape, size and colour encodings

The Code and the Report

References

Previous
Previous

Sentiment Analysis

Next
Next

Coronavirus Pandemic (COVID-19)