Some models predict a yes/no (categorical) variable. This is called classification. We typically use it for comparing a prediction against actual results. For example, how well does our model predict which students will pass a class?
Outcomes:
Links
Measuring error with is done with a confusion matrix. This compares predicted values against actual values.
As an example, imagine we are a hunter trying to find deer in the forest. Are we looking at a deer (positive) or a person (negative)?
Our model can be very simple, if we see something gray, it’s a deer. If we see something else, it’s a person. How well does this model work?
| Predicted Positive, think yes deer | Predicted Negative, think no deer | |
|---|---|---|
| Positive = Deer | True Positive (TP) | False Negative (FN) |
| Negative = Person | False Positive (FP) | True Negative (TN) |
From this, we can calculate several metrics to evaluate our model:
There are tradeoffs between precision and recall. For example, if we want to be very sure we are only shooting deer (high precision), we may miss some deer (low recall). Conversely, if we want to make sure we shoot all the deer (high recall), we may accidentally shoot some people (low precision).
Imagine we are predicting which people are fraudsters
We have 6 people:
This translates to the confusion matrix:
| Matrix | Predicted Fraud | Predicted Ok |
|---|---|---|
| Positive = Fraud | A, B (TP) | C (FN) |
| Negative = Ok. | D, E (FP) | F (TN) |
From this, we can calculate: