Modeling to predict the future
Our prior course focused on visualizations as a means to understand.
This course builds on that concept, but mainly focuses on creating
models. A model is a simplified version of the real-world, expressed in
mathematical terms.
We have several key challenges:
- Simplicity: we want models that are concise with the fewest possible
fields
- Measurement: all measurement is flawed, but we need to capture the
right amount of detail
- Error: all models are limited, so we need standard ways of capturing
error.
What is Machine Learning?
See a
visual guide
Outcomes:
- Definitions of a feature, split point, recursion, and
overfitting
- Describe a scatterplot, decision tree, histogram,
- Measure accuracy
- Differentiate between training and test data
Elements
To model a problem, we will:
- Get a dataset
- Variables (or predictors or features)
- Target (or outcome)
- Understand the dataset
- Understand each variable’s distribution
- Understand relationships between variables
- Prepare the data
- Fix data problems to make it work better with our algorithm
- Split the data into training & test
- Create a model
- Pick an appropriate model algorithm
- Train a model on the training data
- Measure the accuracy of the model on training data
- Numerical or categorical (false positive, true negative, …)
- Measure the accuracy of the model on testing data
- Communicate results
- Charts
- Written / presentation
- Mathematical formulas