This tutorial also supports Datacamp’s Introduction to Data Visualization with ggplot2.
It builds on the prior section by discussing ways to clean-up basic charts.
Draw a line or box on the chart.
library(tidyverse)
## Warning: package 'ggplot2' was built under R version 4.3.3
## ── Attaching core tidyverse packages ──────────────────────── tidyverse 2.0.0 ──
## ✔ dplyr 1.1.4 ✔ readr 2.1.5
## ✔ forcats 1.0.0 ✔ stringr 1.5.1
## ✔ ggplot2 3.5.1 ✔ tibble 3.2.1
## ✔ lubridate 1.9.3 ✔ tidyr 1.3.1
## ✔ purrr 1.0.2
## ── Conflicts ────────────────────────────────────────── tidyverse_conflicts() ──
## ✖ dplyr::filter() masks stats::filter()
## ✖ dplyr::lag() masks stats::lag()
## ℹ Use the conflicted package (<http://conflicted.r-lib.org/>) to force all conflicts to become errors
ggplot(data = mpg) +
geom_point(mapping = aes(x = cty, y = hwy)) +
geom_vline(xintercept = 20) +
geom_hline(yintercept = 30) +
geom_rect(xmin = 20, xmax = 25, ymin = 25, ymax = 30,
alpha = 0.005,
fill = 'green')
Add labels with geom_label_repel. There is also a geom_text, but it will plot labels on top of the data points.
library(ggrepel)
mpg_2seater <- filter(mpg, class == '2seater')
ggplot(data = mpg_2seater) +
geom_point(mapping = aes(x = cty, y = hwy)) +
geom_label_repel(mapping = aes(x = cty, y = hwy, label = model))
We can add a variety of labels to a plot.
ggplot(data = mpg) +
geom_bar(mapping = aes(y = manufacturer)) +
labs(title = 'Main title',
subtitle = 'Subtitle title',
caption = 'Caption at bottom of chart') +
xlab('Label for x axis') +
ylab('Label for y axis')
There are some nice options for themes. Some include:
You can also find more themes in the ggthemes
package.
ggplot(data = mpg) +
geom_point(mapping = aes(x = cty, y = hwy)) +
theme_void()
Set a max/min for an axis.
ggplot(data = mpg) +
geom_point(mapping = aes(x = cty, y = hwy)) +
xlim(0, 20) +
ylim(0, 20)
## Warning: Removed 145 rows containing missing values or values outside the scale range
## (`geom_point()`).
We can customize the axis scales.
You need to match the type of scale to your datatype. Is the data continuous (ie., a number) or discrete (generally text)?
A discrete scale handles a vector of text values. Set custom
labels
using a vector.
ggplot(data = mpg) +
geom_point(mapping = aes(y = class, x = hwy)) +
scale_y_discrete(
labels = c('2 Seater', 'Compact Car', 'Midsize', 'Minivan',
'Pickup', 'Sub-compact', 'SUV'),
name = 'Car Classification')
A continuous scale is for a series of numbers.
We can set custom breaks, as well as the min/max.
ggplot(data = mpg) +
geom_point(mapping = aes(y = hwy, x = hwy)) +
scale_x_continuous(n.breaks = 5, limits = c(20, 30)) +
scale_y_continuous(breaks = c(15, 20, 25))
## Warning: Removed 100 rows containing missing values or values outside the scale range
## (`geom_point()`).
A log scale helps us see data that grows at an exponential level.
ggplot(data = mpg) +
geom_point(mapping = aes(y = hwy, x = hwy)) +
scale_x_log10()
Dates/datetimes are continuous values, but don’t use a continuous
scale. Use scale_x_date
and scale_x_datetime
for additional options.
Our main options are:
labels = scales::label_date("format string")
"format string"
options by using F1 on label_date,
and go to its format section, and click on strptime()
.
Scroll down for a list of options."%Y-%m-%d"
shows as '2023-01-09'
"%H:%M:%S"
shows as '02:00:00'
date_breaks = "number periods"
"number period"
is a combination of a number and a
period (such as hour, minute, year, etc…)1 month
3 hours
limits = c(start_date, end_end)
limits = c( ymd('2023-01-01'), ymd('2023-01-30')
)`limits = c( ymd_hm('2023-01-01 06:00am'), ymd_hm('2023-01-01 06:00pm')
See ggplot’s label_date for help on the scale.
See lubridate for help on dealing with dates.
library(lubridate)
date_tibble <- tibble(
open = c(ymd_hm('2023-01-01 8:00am'),
ymd_hm('2023-01-02 9:00am'),
ymd_hm('2023-01-09 3:30pm'),
ymd_hm('2023-01-25 5:45pm'))
)
ggplot(data = date_tibble) +
geom_point(mapping = aes(y = open, x = (open))) +
scale_y_datetime(labels = scales::label_date("%Y-%m-%d"),
date_breaks = '1 week',
limits = c(
ymd_hm('2023-01-01 00:00'),
ymd_hm('2023-02-15 00:00'))
) +
scale_x_datetime(labels = scales::label_time("%H:%M:%S"),
date_breaks = '100 hours')
Breaks also may need formatting to fix labels.
accuracy = 0.1
to round to 10%, or
0.01
to round to 1%.ggplot(data = mpg) +
geom_point(mapping = aes(y = cty, x = hwy)) +
scale_x_continuous(labels = scales::label_dollar()) +
scale_y_continuous(labels = scales::label_percent(accuracy = 1))
Use expand to give some extra space around the start / end points on the axis.
expand = c(*multiply*, *add*)
can be a little confusing.
Use multiply to times the limits by a number to find the most
extreme values. Use add to manually expand a little by adding /
subtracting a number.
The below gives some extra space. We multiply the y axis by 40 to find the upper limit of 80, and the lower of -40. We add 5 to the x to find the limits of -5 and 45.
ggplot(data = mpg) +
geom_point(mapping = aes(y = cty, x = hwy)) +
scale_x_continuous(limits = c(0, 50), expand = c(0, 5)) +
scale_y_continuous(limits = c(0, 40), expand = c(1, 0))