Codecademy Logo

Variance and Standard Deviation

Variance

Variance is a measure of spread. It is calculated by finding the average of the squared differences between every observation and the mean. The resulting value is in units squared.

Interpretation of Variance

A larger variance means the data is more spread out and values tend to be far away from the mean. A variance of 0 means all values in the dataset are the same.

Calculating Variance in Python

In Python, we can calculate the variance of an array using the NumPy var() function.

import numpy as np values = np.array([1,3,4,2,6,3,4,5]) # calculate variance of values variance = np.var(values)

Standard Deviation

The standard deviation is a measure of a dataset’s spread. It is calculated by taking the square root of the variance of a data set. The resulting value has the same units as the original data.

Standard Deviation Units

Because standard deviation is in the same units as the original data set, it is often used to provide context for the mean of the dataset. For example, if the data set is [3, 5, 10, 14], the standard deviation is 4.301 units, and the mean is 8.0 units. By using the standard deviation, we can fairly easily see that the data point 14 is more than one standard deviation away from the mean.

Calculating Standard Deviation in Python

We can calculate standard deviation in Python using the NumPy std() function.

import numpy as np values = np.array([1,3,4,2,6,3,4,5]) # calculate standard deviation of values variance = np.std(values)

Related Courses

Skill Path

Analyze Data with Python

Beginner friendly

28 Lessons