Learn
Joining Tables in R
Left and Right Joins

### Left Join

Suppose we want to identify which customers are missing phone information. We would want a list of all customers who have `email`, but don’t have `phone`.

We could get this by performing a Left Join. A Left Join includes all rows from the first (left) table, but only rows from the second (right) table that match the first table.

For this command, the order of the arguments matters. If the first data frame is `company_a` and we do a left join, we’ll only end up with rows that appear in `company_a`.

By listing `company_a` first, we get all customers from Company A, and only customers from Company B who are also customers of Company A.

``````left_joined_df <- company_a %>%
left_join(company_b)``````

The result would look like this:

name email phone
Sally Sparrow [email protected] `NA`
Peter Grant [email protected] 212-345-6789
Leslie May [email protected] 626-987-6543

Now let’s say we want a list of all customers who have `phone` but no `email`. We can do this by performing a Right Join.

### Right Join

A Right Join is the exact opposite of left join. Here, the joined table will include all rows from the second (right) table, but only rows from the first (left) table that match the second table.

By listing `company_a` first and `company_b` second, we get all customers from Company B, and only customers from Company A who are also customers of Company B.

``````right_joined_df <- company_a %>%
right_join(company_b)``````

The result would look like this:

name email phone
Peter Grant [email protected] 212-345-6789
Leslie May [email protected] 626-987-6543
Aaron Burr `NA` 303-456-7891

### Instructions

1.

Let’s return to the two hardware stores, Store A and Store B. They’re not quite sure if they want to merge into a big Super Store just yet.

Store A wants to find out what products they carry that Store B does not carry. Using a left join, combine `store_a` to `store_b` and save the results to `left_a_b`.

The items with `NA` in `left_a_b` are carried by Store A, but not by Store B.

2.

Now, Store B wants to find out what products they carry that Store A does not carry. Use a left join, to combine the two data frames but in the reverse order (i.e., `store_b` followed by `store_a`) and save the results to the variable `left_b_a`.

Which items are not carried by Store A, but are carried by Store B?

What do you notice about these two data frames?

How are they different?

How are they the same?