Learn R: Data Cleaning
Learn how to prepare data for analysis in R using dplyr and tidyr.
StartKey Concepts
Review core concepts you need to learn to master this subject
gsub() R Function
gather() tidyr
distinct() dplyr
str() Function
Combing Data with R
R as.numeric() Function
str_sub() function
The dplyr
and tidyr
packages
gsub() R Function
gsub() R Function
# Replace the element "1" with the empty string in the teams vector in order to get the teams_clean vector with the correct names.
teams <- c("Fal1cons", "Cardinals", "Seah1awks", "Vikings", "Bro1nco", "Patrio1ts")
teams_clean <- gsub("1", "", teams)
print(teams_clean)
# Output:
# "Falcons" "Cardinals" "Seahawks" "Vikings" "Bronco" "Patriots"
The base R gsub()
function searches for a regular expression in a string and replaces it. The function recieve a string or character to replace, a replacement value, and the object that contains the regular expression. We can use it to replace substrings within a single string or in each string in a vector.
When combined with dplyr’s mutate()
function, a column of a data frame can be cleaned to enable analysis.
What you'll create
Portfolio projects that showcase your new skills
How you'll master it
Stress-test your knowledge with quizzes that help commit syntax to memory