course_model

Predicting Categories (Classification Task)

Some models predict a yes/no (categorical) variable. This is called classification. We typically use it for comparing a prediction against actual results. For example, how well does our model predict which students will pass a class?

Outcomes:

Links

Confusion Matrix

Measuring error with is done with a confusion matrix. This compares predicted values against actual values.

As an example, imagine we are a hunter trying to find deer in the forest. Are we looking at a deer (positive) or a person (negative)?

Our model can be very simple, if we see something gray, it’s a deer. If we see something else, it’s a person. How well does this model work?

  Predicted Positive, think yes deer Predicted Negative, think no deer
Positive = Deer True Positive (TP) False Negative (FN)
Negative = Person False Positive (FP) True Negative (TN)

Measuring Accuracy, Precision, and Recall

From this, we can calculate several metrics to evaluate our model:

There are tradeoffs between precision and recall. For example, if we want to be very sure we are only shooting deer (high precision), we may miss some deer (low recall). Conversely, if we want to make sure we shoot all the deer (high recall), we may accidentally shoot some people (low precision).

Example : Fraud Prediction

Imagine we are predicting which people are fraudsters

We have 6 people:

This translates to the confusion matrix:

Matrix Predicted Fraud Predicted Ok
Positive = Fraud A, B (TP) C (FN)
Negative = Ok. D, E (FP) F (TN)

From this, we can calculate: