# Introduction to Seaborn

Use Seaborn, a Python data visualization library, to create bar charts for statistical analysis.

Start## Key Concepts

Review core concepts you need to learn to master this subject

Seaborn

Estimator argument in barplot

Seaborn barplot

Barplot error bars

Seaborn hue

Seaborn function plots means by default

Box and Whisker Plots in Seaborn

Seaborn Package

Seaborn

Seaborn

Seaborn is a Python data visualization library that builds off the functionalities of Matplotlib and integrates nicely with Pandas DataFrames. It provides a high-level interface to draw statistical graphs, and makes it easier to create complex visualizations.

Estimator argument in barplot

Estimator argument in barplot

The `estimator`

argument of the `barplot()`

method in Seaborn can alter how the data is aggregated. By default, each bin of a barplot displays the mean value of a variable. Using the `estimator`

argument this behaviour would be different.

The `estimator`

argument can receive a function such as `np.sum`

, `len`

, `np.median`

or other statistical function. This function can be used in combination with raw data such as a list of numbers and display in a barplot the desired statistic of this list.

Seaborn barplot

Seaborn barplot

In Seaborn, drawing a barplot is simple using the function `sns.barplot()`

. This function takes in the paramaters `data`

, `x`

, and `y`

. It then plots a barplot using `data`

as the dataframe, or dataset for the plot. `x`

is the column of the dataframe that contains the labels for the x axis, and `y`

is the column of the dataframe that contains the data to graph (aka what will end up on the y axis).

Using the Seaborn sample data “tips”, we can draw a barplot having the days of the week be the x axis labels, and the total_bill be the y axis values:

`sns.barplot(data = tips, x = "day", y = "total_bill")`

Barplot error bars

Barplot error bars

By default, Seaborn’s `barplot()`

function places error bars on the bar plot. Seaborn uses a bootstrapped confidence interval to calculate these error bars.

The confidence interval can be changed to standard deviation by setting the parameter `ci = "sd"`

.

Seaborn hue

Seaborn hue

For the Seaborn function `sns.barplot()`

, the `hue`

parameter can be used to create a bar plot with more than one dimension, or, in other words, such that the data can be divided into more than one set of columns.

Using the Seaborn sample data “tips”, we can draw a barplot with the days of the week as the labels of the columns on the x axis, and the total_bill as the y axis values as follows:

`sns.barplot(data = tips, x = "day", y = "total_bill", hue = "sex")`

As you can see, `hue`

divides the data into two columns based on the “sex” - male and female.

Seaborn function plots means by default

Seaborn function plots means by default

By default, the seaborn function `sns.barplot()`

plots the means of each category on the x axis.

In the example code block, the barplot will show the mean satisfaction for every gender in the dataframe `df`

.

Box and Whisker Plots in Seaborn

Box and Whisker Plots in Seaborn

A box and whisker plot shows a dataset’s median value, quartiles, and outliers. The box’s central line is the dataset’s median, the upper and lower lines marks the 1st and 3rd quartiles, and the “diamonds” shows the dataset’s outliers. With Seaborn, multiple data sets can be plotted as adjacent box and whisker plots for easier comparison.

Seaborn Package

Seaborn Package

Seaborn is a suitable package to plot variables and compare their distributions. With this package users can plot univariate and bivariate distributions among variables. It has superior capabilities than the popular methods of charts such as the barchart. Seaborn can show information about outliers, spread, lowest and highest points that otherwise would not be shown on a traditional barchart.

- 1In this lesson, you’ll learn how to use Seaborn to create bar charts for statistical analysis. Seaborn is a Python data visualization library that provides simple code to create elegant visualizat…
- 2Throughout this lesson, you’ll use Seaborn to visualize a Pandas DataFrame. DataFrames contain data structured into rows and columns. DataFrames look similar to other data tables you may be famil…
- 3Take a look at the file called
**results.csv**. You’ll plot that data soon, but before you plot it, take a minute to understand the context behind that data, which is based on a hypothetical situat… - 4Seaborn can also calculate
*aggregate statistics*for large datasets. To understand why this is helpful, we must first understand what an*aggregate*is. An aggregate statistic, or aggregate, is … - 5Recall our gradebook from the previous exercise: |student|assignment_name|grade| |-|-|-| |Amy|Assignment 1|75| |Amy|Assignment 2|82| |Bob|Assignment 1|99| |Bob|Assignment 2| 90| |Chris|Assignm…
- 6By default, Seaborn will place
*error bars*on each bar when you use the barplot() function. Error bars are the small lines that extend above and below the top of each bar. Errors bars visually in… - 7In most cases, we’ll want to plot the mean of our data, but sometimes, we’ll want something different:
*If our data has many outliers, we may want to plot the*If our data is categorica…*median*. - 8Sometimes we’ll want to aggregate our data by multiple columns to visualize nested categorical variables. For example, consider our hospital survey data. The mean satisfaction seems to depend on…
- 9In this lesson you learned how to extend Matplotlib with Seaborn to create meaningful visualizations from data in DataFrames. You’ve also learned how Seaborn creates aggregated charts and how to c…

- 1In this lesson, we will explore how to use Seaborn to graph multiple statistical distributions, including box plots and violin plots. Seaborn is optimized to work with large datasets — from …
- 2Before we dive into these new charts, we need to understand why we’d want to use them. To best illustrate this idea, we need to revisit bar charts. We previously learned that Seaborn can quickly …
- 3Bar plots can tell us what the mean of our dataset is, but they don’t give us any hints as to the distribution of the dataset values. For all we know, the data could be clustered around the mean or…
- 4To plot a KDE in Seaborn, we use the method sns.kdeplot(). A KDE plot takes the following arguments: - data - the univariate dataset being visualized, like a Pandas DataFrame, Python list, or N…
- 5While a KDE plot can tell us about the shape of the data, it’s cumbersome to compare multiple KDE plots at once. They also can’t tell us other statistical information, like the values of outliers. …
- 6One advantage of the box plot over the KDE plot is that in Seaborn, it is easy to plot multiples and compare distributions. Let’s look again at our three datasets, and how they look plotted as bo…
- 7As we saw in the previous exercises, while it’s possible to plot multiple histograms, it is not a great option for comparing distributions. Seaborn gives us another option for comparing distributio…
- 8Violin Plots are a powerful graphing tool that allows you to compare multiple distributions at once. Let’s look at how our original three data sets look like as violin plots: sns.violinplot(data…
- 9In this lesson, we examined how Seaborn has several plots that can visualize distributions. While bar plots can display basic aggregates, KDE plots, dist plots, box plots, and violin plots can show…

## What you'll create

Portfolio projects that showcase your new skills

## How you'll master it

Stress-test your knowledge with quizzes that help commit syntax to memory