Some models predict a continuous variable. For example, can we predict a student’s test score based on hours studied?
Outcomes:
Links:
Other Resources:
A simple model for predicting numerical value is correlation. Correlation measures the strength and direction of a linear relationship between two continuous variables.
Interpretation:
Correlation is not causation! Two variables may be correlated, but that does not mean one causes the other. There may be a third variable causing both, or it may be a coincidence.

In a classical approach to statistics, we measure error with p-values.
p-value: the probability of observing the data if the null hypothesis is true
In newer ML approaches, we will measure error by splitting out data into test and training sets. After training our model, we will evaluate it using the test set. This will be covered in more detail in later modules.
A correlation has both:
There are a variety of problems that can affect correlation:
Imagine we are trying to predict a customer’s total sales based on advertisements viewed. We find a positive correlation of 0.5, meaning that for every 2 advertisements a person sees, they will increase their total sales by 1.
We will test this model. We need to calculate error, or the difference between the actual result and our model.
We have 3 customers:
RMSE is the squared difference of each error. Here is a good reference.
Calculate the squared difference of each point, (10 - 10)^2 + (12 - 10)^2 + (8 - 10)^2 = 8
Divide by the number of observations, and take the square root. (20 / 3) ^ .5 = 2.58
We may also want to see the difference between our prediction and actual values. (10 - 10), (12 - 10), (8 - 10) –> (0, 2, -2)
So, the RMSE is the square root of the variance. This is essentially the average distance between predicted and actual values.
The coefficient of determination tells us the proportion of variance in our dependent variable that can be explained by our independent variables.
It ranges from 0 to 1. Generally, the higher the number the better the prediction. This will be more fully explained in the regression sections.