This functions allow us to reshape data. Instead of changing individual columns, we can instead move from wide to tall format.
We often want to transition data from a wide format to tall, and vice versa.
This was moved from dplyr
to the tidyr
package. But, including tidyverse will load all of the appropriate
libraries.
Going wide means we take multiple rows and turn them into a single row with multiple columns.
The two key arguments are:
names_from
is a field, each value will turn into a new
columnvalues_from
is a field, each value will turn into the
data for a new columnlibrary(tidyverse)
t_long <- tibble(
state = c('CA', 'CA', 'CA', 'TX', 'TX', 'TX'),
year = c(2000, 2001, 2002, 2000, 2001, 2002),
sales = c(1, 2, 3, 11, 12, 13)
)
t_wide <- t_long %>%
pivot_wider(names_from = year, values_from = sales )
print(t_wide)
## # A tibble: 2 × 4
## state `2000` `2001` `2002`
## <chr> <dbl> <dbl> <dbl>
## 1 CA 1 2 3
## 2 TX 11 12 13
Going long/tall means we go from a single row with multiple columns and into multiple rows with a single column.
The three key arguments are:
cols
are the fields/columns we should remove
c(field1, field2)
dplyr::select
field1:field10
selects all of the columns from one
field to another. Very good for year columnsstarts_with('value')
picks all fields with the
beginning value.names_from
is a “string” to call the new column for the
namesvalues_from
is a “string” to call the new column for
the valueslibrary(tidyverse)
t_wide <- tibble(
year = c(2000, 2001, 2002),
ca = c(1, 2, 3),
tx = c(11, 12, 13)
)
t_long <- t_wide %>%
pivot_longer(cols = c(ca, tx), names_to = 'state', values_to = 'sales' )
print(t_long)
## # A tibble: 6 × 3
## year state sales
## <dbl> <chr> <dbl>
## 1 2000 ca 1
## 2 2000 tx 11
## 3 2001 ca 2
## 4 2001 tx 12
## 5 2002 ca 3
## 6 2002 tx 13