Introduction to Pandas
Use Pandas to create and manipulate tables so that you can process your data faster and get your insights sooner.
StartKey Concepts
Review core concepts you need to learn to master this subject
Pandas DataFrame creation
Pandas
Selecting Pandas DataFrame rows using logical operators
Pandas apply() function
Pandas DataFrames adding columns
Pandas DataFrame creation
Pandas DataFrame creation
# Ways of creating a Pandas DataFrame
# Passing in a dictionary:
data = {'name':['Anthony', 'Maria'], 'age':[30, 28]}
df = pd.DataFrame(data)
# Passing in a list of lists:
data = [['Tom', 20], ['Jack', 30], ['Meera', 25]]
df = pd.DataFrame(data, columns = ['Name', 'Age'])
# Reading data from a csv file:
df = pd.read_csv('students.csv')
The fundamental Pandas object is called a DataFrame. It is a 2-dimensional size-mutable, potentially heterogeneous, tabular data structure.
A DataFrame can be created multiple ways. It can be created by passing in a dictionary or a list of lists to the pd.DataFrame()
method, or by reading data from a CSV file.
- 1Pandas is a Python module for working with tabular data (i.e., data in a table with rows and columns). Tabular data has a lot of the same functionality as SQL or Excel, but Pandas adds the power of…
- 2A DataFrame is an object that stores data as rows and columns. You can think of a DataFrame as a spreadsheet or as a SQL table. You can manually create a DataFrame or fill it with data from a CSV, …
- 3You can also add data using lists. For example, you can pass in a list of lists, where each one represents a row of data. Use the keyword argument columns to pass a list of column names. df2 …
- 4We now know how to create our own DataFrame. However, most of the time, we’ll be working with datasets that already exist. One of the most common formats for big datasets is the CSV. *CSV (com…
- 5When you have data in a CSV, you can load it into a DataFrame in Pandas using .read_csv(): pd.read_csv(‘my-csv-file.csv’) In the example above, the .read_csv() method is called. The CSV file cal…
- 6When we load a new DataFrame from a CSV, we want to know what it looks like. If it’s a small DataFrame, you can display it by typing print(df). If it’s a larger DataFrame, it’s helpful to be able…
- 7Now we know how to create and load data. Let’s select parts of those datasets that are interesting or important to our analyses. Suppose you have the DataFrame called customers, which contains the…
- 8When you have a larger DataFrame, you might want to select just a few columns. For instance, let’s return to a DataFrame of orders from ShoeFly.com: |id|first_name|last_name|email|shoe_type|sh…
- 9Let’s revisit our orders from ShoeFly.com: |id|first_name|last_name|email|shoe_type|shoe_material|shoe_color| | — | — | — | — | — | — | — | |54791|Rebecca|Lindsay|RebeccaLindsay57…
- 10You can also select multiple rows from a DataFrame. Here are a few more rows from ShoeFly.com’s orders DataFrame: |id|first_name|last_name|email|shoe_type|shoe_material|shoe_color| |-|-|-|-|-|…
- 11You can select a subset of a DataFrame by using logical statements: df[df.MyColumnName == desired_column_value] We have a large DataFrame with information about our customers. A few of the many r…
- 12You can also combine multiple logical statements, as long as each statement is in parentheses. For instance, suppose we wanted to select all rows where the customer’s age was under 30 or the cus…
- 13Suppose we want to select the rows where the customer’s name is either “Martha Jones”, “Rose Tyler” or “Amy Pond”. |name|address|phone|age| |-|-|-|-| |Martha Jones|123 Main St.|234-567-8910|2…
- 14When we select a subset of a DataFrame using logic, we end up with non-consecutive indices. This is inelegant and makes it hard to use .iloc(). We can fix this using the method .reset_index(). F…
What you'll create
Portfolio projects that showcase your new skills
How you'll master it
Stress-test your knowledge with quizzes that help commit syntax to memory
Creating, Loading, and Selecting Data with Pandas
Consider the following code that is intended to create a new DataFrame showing the grades of students in a class. Will this code create a valid DataFrame? If not, why?
Modifying DataFrames
Consider the following DataFrame showing the daily inventory and amount of products sold of a local office supply store. You want to add a column to this DataFrame to determine how many of each item is remaining at the end of the day. Which of the following lines of code would accomplish this? | |product|price|initial_inventory|number_sold| |-|-|-|-|-| |0|pencil-pack|0.05|35|12| |1|pens-pack|3.10|15|14| |2|notebook|5.00|10|3| |3|tape-dispenser|4.25|20|18| |4|stapler|3.50|8|3|