Selecting a Predictive Analytical Framework
Data-Poor Environment
A/B Tests: If there is insufficient usable data to solve the problem, then an experiment can help us get the data needed. In a business context, this is usually referred to as an A/B Test.
Data-Rich Environment
Target Variables: Target variables represent the outcome we are trying to predict. To select the suitable predictive model, we first determine whether the target variable is numeric or non-numeric. The type of target variable will determine which model is appropriate. Let’s start with numeric variables.
Numeric vs Non-Numeric
Assuming we have enough data to proceed with the analysis, the next decision is to look at the outcome we’re trying to predict and determine if it’s a numeric outcome or a non-numeric outcome.
Numeric outcomes are those where the outcome is a number. Predicting the demand for electricity or the hourly temperature are both examples of numeric outcomes. Models predicting numeric data are called regression models. As an example, imagine that a manufacturer wants to use historical production data to know how many cycles they’ll need to produce over the next six months to meet expected demand. Since the outcome the manufacturer wants to predict is a number, then the target variable is numeric. Therefore, they would use a numeric or regression model to solve this problem.
Types of Numeric Variables
The three most common types of numeric variables are continuous, time-based, and count.
Continuous: A continuous variable can take on all values in a range. For instance, your height can be measured down to many decimal places. We do not grow in even inch intervals.
Time-Based: A time-based numeric variable is one where you are trying to predict what will happen over time. This is often related to forecasting.
Count: Count variables are discrete, positive integers. They’re called count numbers because they’re used to analyze variables that you can count.
Non-Numeric Variables
A non-numeric variable is often referred to as categorical because the values of the variable take on a discrete number of possible values. Examples include whether an electronic device will fail before 1000 hours or not, whether a customer will pay on time, pay late, or default on a payment, or whether a store is classified as large, medium or small. Models predicting non-numeric data are called classification models.
For example, a bank wants to use historical data of their existing clients to predict whether a new customer will default on a loan, always pay on time, or sometimes pay. Since the outcome the bank is trying to predict is a category that the new customer will fall into, they would use a non-numeric or classification model to solve this problem.
Classification Models: Binary and Non-Binary
When modelling categorical variables, the number of possible outcomes is an important consideration. If there are only two possible categorical outcomes, such as Yes or No, or True or False, the variable can be described as Binary.
If there are more than two possible categorical outcomes, such as small, medium, or large, the variable can be described as non-binary.