This tutorial shows common charts made with the Seaborn and Matplotlib libraries.
Outcomes:
Links:
Optional Reading
Use bar plots to compare means or medians across categories.
Required:
data = dataframe: your datasetx = 'fieldname of category': categorical variabley = 'fieldname of quantitative': quantitative variable (or switch to x for a horizontal bar plot)Optional:
hue = 'fieldname of category': adds color grouping for each barestimator = np.mean: function to compute the value to be plotted (default is mean, but you can use np.median, np.sum, etc.)errorbar = ('ci', 95): confidence interval for the estimate (default is 95%)# Barplot Example
import seaborn as sns
import matplotlib.pyplot as plt
import numpy as np
df_penguins = sns.load_dataset("penguins")
# Bar plot: average body mass by species
sns.barplot(data=df_penguins, y="species", x="body_mass_g", hue="sex", errorbar = ('ci', 50), estimator=np.median)
plt.title("Bar Plot: Average Body Mass by Species")
plt.show()

Similar to bar plot, but for counting occurrences of categories.
# Count plot: frequency of species
sns.countplot(data=df_penguins, x="species")
plt.title("Count Plot: Number of Penguins by Species")
plt.show()

Show trends over time
Required:
data = dataframe: your datasetx = 'fieldname of quantitative': quantitative variable for x-axisy = 'fieldname of quantitative': quantitative variable for y-axisOptional:
hue = 'fieldname of category': adds color grouping for each line. Disables error bars by default.estimator = np.mean: function to compute the value to be plotted (default is mean)errorbar = ('ci', 95): None or a tuple with the confidence interval for the estimate (default is 95%)# Lineplot
import seaborn as sns
import matplotlib.pyplot as plt
import numpy as np
# Load example dataset
df_flights = sns.load_dataset("flights")
# Line plot showing passengers per year
sns.lineplot(data=df_flights, x="year", y="passengers", errorbar=("ci", 95), estimator=np.mean)
plt.title("Line Plot: Passengers Over Time")
plt.show()

Show relationship between two quantitative variables.
Required:
data = dataframe: your datasetx = 'fieldname of quantitative': quantitative variable for x-axisy = 'fieldname of quantitative': quantitative variable for y-axisOptional:
hue = 'fieldname of category': adds color grouping for each point.size = 'fieldname of size': adds size grouping for each point.# Scatterplot
import seaborn as sns
import matplotlib.pyplot as plt
# Sample data
df_penguins = sns.load_dataset("penguins")
# Add third variable (species as hue, body_mass as size)
sns.scatterplot(data=df_penguins, x="bill_length_mm", y="bill_depth_mm",
hue="species", size="body_mass_g")
plt.title("Scatter Plot with Hue & Size")
plt.show()

Show medians, quartiles, and outliers.
Required:
data = dataframe: your datasetx = 'fieldname of category': categorical variabley = 'fieldname of quantitative': quantitative variableOptional:
hue = 'fieldname of category': adds color grouping for each box# Boxplot
import seaborn as sns
import matplotlib.pyplot as plt
df_penguins = sns.load_dataset("penguins")
# Box plot: flipper length across species
sns.boxplot(data=df_penguins, x="species", y="flipper_length_mm", hue="sex")
plt.title("Box Plot: Flipper Length by Species")
plt.show()

Similar to boxplot, but shows more detail about distribution shape.
# Violin plot
import seaborn as sns
import matplotlib.pyplot as plt
df_penguins = sns.load_dataset("penguins")
# Box plot: flipper length across species
sns.violinplot(data=df_penguins, x="species", y="flipper_length_mm", hue="sex")
plt.title("Violin Plot: Flipper Length by Species")
plt.show()
