Skip to Content
Learn
Joining Tables in R
Left and Right Joins

Let’s return to the join of Company A and Company B.


Left Join

Suppose we want to identify which customers are missing phone information. We would want a list of all customers who have email, but don’t have phone.

We could get this by performing a Left Join. A Left Join includes all rows from the first (left) table, but only rows from the second (right) table that match the first table.

For this command, the order of the arguments matters. If the first data frame is company_a and we do a left join, we’ll only end up with rows that appear in company_a.

By listing company_a first, we get all customers from Company A, and only customers from Company B who are also customers of Company A.

left_joined_df <- company_a %>% left_join(company_b)

The result would look like this:

name email phone
Sally Sparrow [email protected] NA
Peter Grant [email protected] 212-345-6789
Leslie May [email protected] 626-987-6543

Now let’s say we want a list of all customers who have phone but no email. We can do this by performing a Right Join.


Right Join

A Right Join is the exact opposite of left join. Here, the joined table will include all rows from the second (right) table, but only rows from the first (left) table that match the second table.

By listing company_a first and company_b second, we get all customers from Company B, and only customers from Company A who are also customers of Company B.

right_joined_df <- company_a %>% right_join(company_b)

The result would look like this:

name email phone
Peter Grant [email protected] 212-345-6789
Leslie May [email protected] 626-987-6543
Aaron Burr NA 303-456-7891

Instructions

1.

Let’s return to the two hardware stores, Store A and Store B. They’re not quite sure if they want to merge into a big Super Store just yet.

Store A wants to find out what products they carry that Store B does not carry. Using a left join, combine store_a to store_b and save the results to left_a_b.

The items with NA in left_a_b are carried by Store A, but not by Store B.

2.

Now, Store B wants to find out what products they carry that Store A does not carry. Use a left join, to combine the two data frames but in the reverse order (i.e., store_b followed by store_a) and save the results to the variable left_b_a.

Which items are not carried by Store A, but are carried by Store B?


What do you notice about these two data frames?

How are they different?

How are they the same?

Folder Icon

Take this course for free

Already have an account?