course_model

Visualize Data with Seaborn

This tutorial shows common charts made with the Seaborn and Matplotlib libraries.

Outcomes:

Create a bar plot
Create a count plot
Create a histogram
Create a box plot
Create a scatter plot
Customize chart titles and axis labels
Change chart styles and color palettes

Links:

template

Optional Reading

Handbook: Seaborn

Bar Plot

Use bar plots to compare means or medians across categories.

Required:

data = dataframe: your dataset
x = 'fieldname of category': categorical variable
y = 'fieldname of quantitative': quantitative variable (or switch to x for a horizontal bar plot)

Optional:

hue = 'fieldname of category': adds color grouping for each bar
estimator = np.mean: function to compute the value to be plotted (default is mean, but you can use np.median, np.sum, etc.)
errorbar = ('ci', 95): confidence interval for the estimate (default is 95%)

# Barplot Example
import seaborn as sns
import matplotlib.pyplot as plt
import numpy as np

df_penguins = sns.load_dataset("penguins")

# Bar plot: average body mass by species
sns.barplot(data=df_penguins, y="species", x="body_mass_g", hue="sex", errorbar = ('ci', 50), estimator=np.median)
plt.title("Bar Plot: Average Body Mass by Species")
plt.show()

png

Count Plot

Similar to bar plot, but for counting occurrences of categories.

# Count plot: frequency of species

sns.countplot(data=df_penguins, x="species")
plt.title("Count Plot: Number of Penguins by Species")
plt.show()

png

Line Plot

Show trends over time

Required:

data = dataframe: your dataset
x = 'fieldname of quantitative': quantitative variable for x-axis
y = 'fieldname of quantitative': quantitative variable for y-axis

Optional:

hue = 'fieldname of category': adds color grouping for each line. Disables error bars by default.
estimator = np.mean: function to compute the value to be plotted (default is mean)
errorbar = ('ci', 95): None or a tuple with the confidence interval for the estimate (default is 95%)

# Lineplot
import seaborn as sns
import matplotlib.pyplot as plt
import numpy as np

# Load example dataset
df_flights = sns.load_dataset("flights")

# Line plot showing passengers per year
sns.lineplot(data=df_flights, x="year", y="passengers", errorbar=("ci", 95), estimator=np.mean)
plt.title("Line Plot: Passengers Over Time")
plt.show()

png

Scatter Plot

Show relationship between two quantitative variables.

Required:

data = dataframe: your dataset
x = 'fieldname of quantitative': quantitative variable for x-axis
y = 'fieldname of quantitative': quantitative variable for y-axis

Optional:

hue = 'fieldname of category': adds color grouping for each point.
size = 'fieldname of size': adds size grouping for each point.

# Scatterplot
import seaborn as sns
import matplotlib.pyplot as plt

# Sample data
df_penguins = sns.load_dataset("penguins")

# Add third variable (species as hue, body_mass as size)
sns.scatterplot(data=df_penguins, x="bill_length_mm", y="bill_depth_mm",
                hue="species", size="body_mass_g")
plt.title("Scatter Plot with Hue & Size")
plt.show()

png

Box Plots

Show medians, quartiles, and outliers.

Required:

data = dataframe: your dataset
x = 'fieldname of category': categorical variable
y = 'fieldname of quantitative': quantitative variable

Optional:

hue = 'fieldname of category': adds color grouping for each box

# Boxplot
import seaborn as sns
import matplotlib.pyplot as plt

df_penguins = sns.load_dataset("penguins")

# Box plot: flipper length across species
sns.boxplot(data=df_penguins, x="species", y="flipper_length_mm", hue="sex")
plt.title("Box Plot: Flipper Length by Species")
plt.show()

png

Violin Plots

Similar to boxplot, but shows more detail about distribution shape.

# Violin plot
import seaborn as sns
import matplotlib.pyplot as plt

df_penguins = sns.load_dataset("penguins")

# Box plot: flipper length across species
sns.violinplot(data=df_penguins, x="species", y="flipper_length_mm", hue="sex")
plt.title("Violin Plot: Flipper Length by Species")
plt.show()

png