Skip to Content
Learn
Hypothesis Testing with R
One Sample T-Test

Consider the fictional business BuyPie, which sends ingredients for pies to your household so that you can make them from scratch. Suppose that a product manager hypothesizes the average age of visitors to BuyPie.com is 30. In the past hour, the website had 100 visitors and the average age was 31. Are the visitors older than expected? Or is this just the result of chance (sampling error) and a small sample size?

You can test this using a One Sample T-Test. A One Sample T-Test compares a sample mean to a hypothetical population mean. It answers the question “What is the probability that the sample came from a distribution with the desired mean?”

The first step is formulating a null hypothesis, which again is the hypothesis that there is no difference between the populations you are comparing. The second population in a One Sample T-Test is the hypothetical population you choose. The null hypothesis that this test examines can be phrased as follows: "The set of samples belongs to a population with the target mean".

One result of a One Sample T-Test will be a p-value, which tells you whether or not you can reject this null hypothesis. If the p-value you receive is less than your significance level, normally 0.05, you can reject the null hypothesis and state that there is a significant difference.

R has a function called t.test() in the stats package which can perform a One Sample T-Test for you.

t.test() requires two arguments, a distribution of values and an expected mean:

results <- t.test(sample_distribution, mu = expected_mean)
  • sample_distribution is the sample of values that were collected
  • mu is an argument indicating the desired mean of the hypothetical population
  • expected_mean is the value of the desired mean

t.test() will return, among other information we will not cover here, a p-value — this tells you how confident you can be that the sample of values came from a distribution with the specified mean.

P-values give you an idea of how confident you can be in a result. Just because you don’t have enough data to detect a difference doesn’t mean that there isn’t one. Generally, the more samples you have, the smaller a difference you can detect.

Instructions

1.

We have provided a small dataset called ages, representing the ages of customers to BuyPie.com in the past hour, in notebook.Rmd.

Even with a small dataset like this, it is hard to make judgments from just looking at the numbers.

To understand the data better, let’s look at the mean. Calculate the mean of ages, and store the result in a variable called ages_mean. View ages_mean.

2.

Use the t.test() function with ages to see what p-value the experiment returns for this distribution, where we expect the mean to be 30.

Store the results of the test in a variable called results.

Does the p-value you got with the One Sample T-Test make sense, knowing the mean of ages?

Folder Icon

Take this course for free

Already have an account?