Statistics in R - Introduction

This corresponds to the DataCamp Introduction to Statistics in R course.

It also pulls concepts from Regression and Other Stories, which is a fantastic statistical book.

There are two main branches of statistics

Data types:

Measures

Mean versus average
Mean versus average

Distributions

Normal distribution

  • Describe the concept of a normal distribution
  • Describe why having a normal distribution is useful
  • List some items having (or not) a normal distribution

3-minute data science “Normal distribution”: https://www.youtube.com/watch?v=3VYupIsbLlY

Normal Distribution
Normal Distribution

Uniform

All outcomes have an equal probability.

Use runif

Binomial

An outcome with either true or false (1 or 0).

Use rbinom

Probability

Correlation

Outcomes:

  • Describe the role of a correlation
  • Know the difference between the statistical significant of a correlation, versus its strength.
  • Magnitude
    • Strong > 50, moderate >25, weak around 20%.
    • Understand negative versus positive sign
  • Problems
    • Non-linear relationships.
      • Can have log applied
    • Correlation v. causation v. confounding

Help: https://www.youtube.com/watch?v=rijqfllOq6g

Good discussion and examples of correlation

Experiment

Controlled experiment