Predicting Catalogue Demand
Project Overview
Tasked with predicting how much money a company can expect to earn from sending a catalogue to new customers, this project builds a linear regression model to provide management with a recommendation on the expected return from the expenditure.
Step 1: Business & Data Understanding
Business
The costs of printing and distributing are $6.50 per catalogue.
The average gross margin (price - cost) on all products sold through the catalogue is 50%.
Data
p1-customers.xlsx - This dataset includes information on 2,300 existing customers.
p1-mailinglist.xlsx - This dataset is used to estimate how much incremental revenue the company can expect from sending the mail catalogue from the 250 new customers.
Step 2: Analysis, Modelling, and Validation
Step 3: Make a recommendation
Step 1: Business & Data Understanding
Qs: What decisions need to be made?
Predict how much money the company can expect to earn from sending out a catalogue to 250 new customers. Will the expense be justified by a minimum expected profit of at least $10,000?
Qs: What data is needed to inform these decisions?
The customer profile of existing customers. Customer profile relates to any information that may help build an understanding of customer segments.
Step 2: Analysis, Modelling, and Validation
Qs: How are the predictor variables chosen?
For a variable to be considered a candidate, it must be present in both the training and test data set. This results in the following predictor variables for consideration:
-Name, Customer_Segment, Customer_ID, Address, City, State, ZIP, Store_Number, Avg_Number_Products_Purchased, #_Years_as_Customer
-Each value for the variables 'Name', 'Customer_ID', 'Address', 'State' and 'ZIP' is unique and cannot be used to discriminate between the target variable.
-The variables 'City', 'Store_Number', and '#_Years_as_Customer' are not statistically significant at the 0.05 level and are also therefore removed from consideration.
That leaves the variables 'Customer_Segment' and 'Avg_Number_of_Products_Purchased.'
Qs: Explain why your linear model is a ‘good’ model.
The adjusted r-squared is 0.83, suggesting a significant proportion of the variation in 'Avg_Sales_Amount' is explained by the predictor variables 'Customer_Segment' and 'Avg_Num_Products_Purchased'. Both predictor variables are statistically significant at the 0.001 level.
Qs: What is the best linear regression equation based on the available data?
Y = 303.46 -149.36 * Customer_SegmentLoyalty Club Only – 281.84 * Customer_SegmentLoyalty Club Only and Credit Card - 245.42 * Customer_SegmentStore Mailing List + 66.98 Avg_Num_Products_Purchased + 0 * Customer_SegmentCredit Card Only
Step 3: Recommendation
Qs: What is your recommendation? Should the company send the catalogue to these 250 customers?
The expected profit contribution is $21,987.45, which is above the minimum investment threshold of $10,000. The company should therefore send the catalogue to the 250 new customers.