Skip to Content
Learn
Working with Multiple DataFrames
Merge on Specific Columns

In the previous example, the merge function “knew” how to combine tables based on the columns that were the same between two tables. For instance, products and orders both had a column called product_id. This won’t always be true when we want to perform a merge.

Generally, the products and customers DataFrames would not have the columns product_id or customer_id. Instead, they would both be called id and it would be implied that the id was the product_id for the products table and customer_id for the customers table. They would look like this:

Customers

idcustomer_nameaddressphone_number
1John Smith123 Main St.212-123-4567
2Jane Doe456 Park Ave.949-867-5309
3Joe Schmo798 Broadway112-358-1321

Products

iddescriptionprice
1thing-a-ma-jig5
2whatcha-ma-call-it10
3doo-hickey7
4gizmo3

**How would this affect our merges?**

Because the id columns would mean something different in each table, our default merges would be wrong.

One way that we could address this problem is to use .rename to rename the columns for our merges. In the example below, we will rename the column id to customer_id, so that orders and customers have a common column for the merge.

pd.merge( orders, customers.rename(columns={'id': 'customer_id'}))

Instructions

1.

Merge orders and products using rename. Save your results to the variable orders_products.

2.

Display orders_products using print.

Folder Icon

Sign up to start coding

Already have an account?