Predicting Catalogue Demand

Project Overview

Tasked with predicting how much money a company can expect to earn from sending a catalogue to new customers, this project builds a linear regression model to provide management with a recommendation on the expected return from the expenditure.


 

Step 1: Business & Data Understanding 

Business

  • The costs of printing and distributing are $6.50 per catalogue.

  • The average gross margin (price - cost) on all products sold through the catalogue is 50%.

Data

p1-customers.xlsx - This dataset includes information on 2,300 existing customers. 

p1-mailinglist.xlsx - This dataset is used to estimate how much incremental revenue the company can expect from sending the mail catalogue from the 250 new customers. 

Step 2: Analysis, Modelling, and Validation

Step 3: Make a recommendation


 

Step 1: Business & Data Understanding

Qs: What decisions need to be made?

Predict how much money the company can expect to earn from sending out a catalogue to 250 new customers. Will the expense be justified by a minimum expected profit of at least $10,000?

 

Qs: What data is needed to inform these decisions?

The customer profile of existing customers. Customer profile relates to any information that may help build an understanding of customer segments.

 

Step 2: Analysis, Modelling, and Validation

Qs: How are the predictor variables chosen? 

For a variable to be considered a candidate, it must be present in both the training and test data set. This results in the following predictor variables for consideration:

-Name, Customer_Segment, Customer_ID, Address, City, State, ZIP, Store_Number, Avg_Number_Products_Purchased, #_Years_as_Customer

-Each value for the variables 'Name', 'Customer_ID', 'Address', 'State' and 'ZIP' is unique and cannot be used to discriminate between the target variable.

-The variables 'City', 'Store_Number', and '#_Years_as_Customer' are not statistically significant at the 0.05 level and are also therefore removed from consideration.

That leaves the variables 'Customer_Segment' and 'Avg_Number_of_Products_Purchased.'

PredictCatalogDemandScatter.png

Qs: Explain why your linear model is a ‘good’ model. 

The adjusted r-squared is 0.83, suggesting a significant proportion of the variation in 'Avg_Sales_Amount' is explained by the predictor variables 'Customer_Segment' and 'Avg_Num_Products_Purchased'. Both predictor variables are statistically significant at the 0.001 level.

 

Qs: What is the best linear regression equation based on the available data? 

Y = 303.46 -149.36 * Customer_SegmentLoyalty Club Only – 281.84 * Customer_SegmentLoyalty Club Only and Credit Card - 245.42  * Customer_SegmentStore Mailing List + 66.98 Avg_Num_Products_Purchased + 0 * Customer_SegmentCredit Card Only

 

Step 3: Recommendation

Qs: What is your recommendation? Should the company send the catalogue to these 250 customers?

The expected profit contribution is $21,987.45, which is above the minimum investment threshold of $10,000. The company should therefore send the catalogue to the 250 new customers.

 

 

Previous
Previous

Welcome to tableau

Next
Next

Predicting Diamond Prices