Skip to Content
Learn
Data Cleaning in R
Looking at Data Types

Each column of a data frame can hold items of the same data type. The data types that R uses are: character, numeric (real or decimal), integer, logical, or complex. Often, we want to convert between types so that we can do better analysis. If a numerical category like "num_users" is stored as a vector of characters instead of numerics, for example, it makes it more difficult to do something like make a line graph of users over time.

To see the types of each column of a data frame, we can use:

str(df)

str() displays the internal structure of an R object. Calling str() with a data frame as an argument will return a variety of information, including the data types. For a data frame like this:

item price calories
“banana” “$1” 105
“apple” “$0.75” 95
“peach” “$3” 55
“clementine” “$2.5” 35

the data types would be:

#> $ item: chr #> $ price: chr #> $ calories: num

We can see that the price column is made up of characters, which will probably make our analysis of price more difficult. We’ll look at how to convert columns into numeric values in the next few exercises.

Instructions

1.

Let’s inspect the data types in the students table.

Print out the structure of students.

2.

If we wanted to make a scatterplot of age vs average exam score, would we be able to do it with this type of data?

Paste the following code in the last code block to try and print out the mean of the score column of students.

students %>% summarise(mean_score = mean(score))

What warning do you see?

Folder Icon

Take this course for free

Already have an account?