How do we get data into R?

read_csv

There are a collection of read functions, which all use similar parameters.

Basic example

This will read a text file that looks like this:

id,name
1,"Bob"
2,"Sam"

We only need to pass the name of the file.

library(tidyverse)

t <- read_csv("r15-read.txt")
print(t)

Skip starting lines

This datafile has some starting lines that we need to skip

Here is a datafile

id,name
1,"Bob"
2,"Sam"

Tell read_csv to skip the first 2 lines.

library(tidyverse)

t <- read_csv("r15-read.txt", skip = 2)
print(t)

Fix missing column names

Some datafiles are missing column names.

1,"Bob"
2,"Sam"

Tell read_csv the names for columns

library(tidyverse)

t <- read_csv("r15-read.txt", col_names = c('id', 'name'))
print(t)

Fix NA values

Some data files have missing values. This can take different forms, such as blank strings, spaces, “magic” numbers, or “NA”

id,name
1, 
NA,"Sam"
-1,"Joe"

Tell read_csv which values should be transformed into NA.

library(tidyverse)

t <- read_csv("r15-read.txt", na = c('', ' ', 'NA', '-1'))
print(t)

read_xlsx

We can use read_xlsx to load Excel files.

Basic example

We typically need to provide the name of the sheet.

library(tidyverse)
library(readxl)

t <- read_xlsx("r15-read.xlsx", sheet = 'Sheet of Data')
print(t)

Complex example

We can use similar options as read_csv

library(tidyverse)
library(readxl)

t <- read_xlsx("r15-read.xlsx", 
               sheet = 'Weird Data Sheet',
               na = c('', ' ', 'NA', '-1'),
               skip = 2)
print(t)

Janitor

Janitor is a helpful library that cleans up badly-formatted column titles. I generally use the function that fixes titles.

Treat it like dplyr, using the pipe symbol as part of your clean-up code.

library(tidyverse)
library(janitor)

bad_tibble <- tibble(
  `Title with spaces and CAPITALIZATION` = c(1, 2, 3, 4)
)

good_tibble <- bad_tibble %>% 
  janitor::clean_names()

print(good_tibble)