Day 3
Dr. Elijah Meyer
Duke University
STA 199 - Summer 2023
May 22nd
– Clone ae-03-summer-your-user-name using the SSH key
Note: if you do not see an ae-03 specifically for you, you have not accepted your org invite
– AE’s are being graded: We will demo how to turn AE’s in today
– Keep up to date with Slack
– Finish off data visualization
– Introduce dplyr
functions
– Continue R practice
Identify which plot eachgeom
creates
– geom_point()
– geom_density()
– geom_boxplot()
– geom_bar()
What is the difference between |>
and +
?
penguins |>
ggplot(
aes(x = body_mass_g, fill = species )) +
geom_histogram(binwidth = 200, alpha = 0.3)
Type is how an object is stored in memory.
– glimpse
is a great way to check data types
– Can also use typeof()
– glimpse(mtcars)
– typeof(mtcars$mpg)
Some of the types of variables include:
– “logical”
– “integer”
– “double”
– “character”
– “factor”
– logi
in glimpse
– The logical data type in R is also known as boolean data type. It can only have two values: TRUE and FALSE.
– as.logical
can turn a variable into a logical. False
= 0; True
everything else
– int
in glimpse
– Integers are whole numbers (those numbers without a decimal point)
– as.integer
can turn a double into an integer. Forces 22.8 -> 22.
– dbl
in glimpse
– Real numbers (can include decimals)
– as.double
can force a column to be a double. Identical to as.numeric
.
– chr
in glimpse
– Character string (text)
– as.character
attempts to coerce its argument to character type
– fct
in glimpse
– Factor in R is also known as a categorical variable that stores both string and integer data values as levels.
– factor
attempts to coerce its argument to factor type
am: Transmission (0 = automatic; 1 = manual)
– Functions
– Plotting
– Summary statistics
– Can you identify variable types
– Often need to turn something into a factor to make it categorical
– Often need to turn something into a double (numeric) to make it quantitative
– Want to subset
– Want to manipulate
– Want to create
… from data
Themes are a powerful way to customize the non-data components of your plots: i.e. titles, labels, fonts, background, gridlines, and legends.
theme()
– https://ggplot2.tidyverse.org/reference/theme.html
– I often use it for legend manipulation… but there is so much more!
– Data types matter. Get in the habit of checking them at the beginning of analysis
– Have the tools to create new variables, calculate summary statistics, etc. that accompany strong visualizations
– Have the tools to manipulate data to be in a more usable format