Skip to Content
Learn
Intermediate Data Visualization in R
Box Plots

Box plots, also known as box-and-whisker plots, show the distribution of data by quartiles. Box plots are useful in showing how much a variable varies across values of another variable – are most cases similar in value, or is there a wide range between the highest and lowest values?

In the box plot below, we see the distribution of temperatures for different months within a subset of the airquality dataset. As we would expect for New York City, the summer months have the highest temperatures. The center of the box represents the median temperature. The upper and lower bounds of the box show the 75th and 25th percentiles respectively. The whiskers extend up to 1.5 times the distance between the 75th and 25th percentiles. Beyond the whiskers, outliers are shown as points.

Box Plot: Temperature by Month

We can create a box plot using the geom_boxplot() layer. The code below creates the box plot shown above, visualizing temperature by month in the airquality data.

airquality_boxplot <- ggplot(airquality, aes(x = Month, y = Temp)) + labs(title = "Air Quality: Temperature by Month") + geom_boxplot()

Note that box plots show medians, not means. We’ll cover how to display mean values using bar plots later in this lesson.

Instructions

1.

Construct a box plot object called rideshare_boxplot visualizing the cost of trips in rideshare_df by month, using the Trip.Total and Month variables. In your aes() mapping, transform Month to a factor (x = factor(Month)) so that ggplot knows to treat each month as a discrete value, rather than a continuous number.

Print the rideshare_boxplot object to see what it looks like. Notice what information is depicted in a box plot, compared to what would be included in a bar plot depicting the same data.

Folder Icon

Sign up to start coding

Already have an account?