How do we get data into R?
There are a collection of read functions, which all use similar parameters.
This will read a text file that looks like this:
id,name
1,"Bob"
2,"Sam"
We only need to pass the name of the file.
library(tidyverse)
t <- read_csv("r15-read.txt")
print(t)
This datafile has some starting lines that we need to skip
Here is a datafile
id,name
1,"Bob"
2,"Sam"
Tell read_csv to skip the first 2 lines.
library(tidyverse)
t <- read_csv("r15-read.txt", skip = 2)
print(t)
Some datafiles are missing column names.
1,"Bob"
2,"Sam"
Tell read_csv the names for columns
library(tidyverse)
t <- read_csv("r15-read.txt", col_names = c('id', 'name'))
print(t)
Some data files have missing values. This can take different forms, such as blank strings, spaces, “magic” numbers, or “NA”
id,name
1,
NA,"Sam"
-1,"Joe"
Tell read_csv which values should be transformed into
NA
.
library(tidyverse)
t <- read_csv("r15-read.txt", na = c('', ' ', 'NA', '-1'))
print(t)
We can use read_xlsx
to load Excel files.
We typically need to provide the name of the sheet.
library(tidyverse)
library(readxl)
t <- read_xlsx("r15-read.xlsx", sheet = 'Sheet of Data')
print(t)
We can use similar options as read_csv
library(tidyverse)
library(readxl)
t <- read_xlsx("r15-read.xlsx",
sheet = 'Weird Data Sheet',
na = c('', ' ', 'NA', '-1'),
skip = 2)
print(t)
Janitor is a helpful library that cleans up badly-formatted column titles. I generally use the function that fixes titles.
Treat it like dplyr
, using the pipe symbol as part of
your clean-up code.
library(tidyverse)
library(janitor)
bad_tibble <- tibble(
`Title with spaces and CAPITALIZATION` = c(1, 2, 3, 4)
)
good_tibble <- bad_tibble %>%
janitor::clean_names()
print(good_tibble)