Day 4
Dr. Elijah Meyer
Duke University
STA 199 - Summer 2023
May 23rd, 2023
– Clone ae-04
– Make sure you are keeping up with Preparation Videos
– Lab-1 due Thursday (25th before class)
– HW-1 due Monday (29th at 11:59 PM)
– All AEs for this week due Friday (26th at 11:59 PM)
Videos
– Requesting videos for missed classes
Homework + Labs
– Late work policy
– Drop 1
Exam
– June 1st
– Take home
– Open Notes / Internet / etc
– Coding + Short answer questions
– Extension questions
– Can NOT be late
– Pull -> Commit -> Push after every question
– Posted in Slack #class-questions
– Reply under the comment
– Will guide warm up questions for Thursday
Please define the following…
– select()
– slice()
– arrange()
We glossed over this on Monday…. it needs more attention!
We can string multiple pipes together. That’s the purpose of tidyverse!
flights |>
select(tailnum, carrier, dep_delay) |>
arrange(desc(dep_delay)) |>
slice(1)
Why can’t I Render?
– Error in your code?
– Duplicate code chunk label?
– Are you using a function that brings up an external panel such as View()
?
Why are my changes not showing up in GitHub?
– Are you in the right project?
You can see the project you are working in in the top right corner of your screen. This MUST be the project that you cloned for the exam / assignment / lab. Do not use the files tab to go search for a file outside of your project repo.
In the console, type mtcars
in the data set
Now, type as_tibble(mtcars)
– A tibble is often considered a neater format of a data frame
– Tibble has a more advanced print function
– Tidyverse functions are built to work with tibbles
https://stackoverflow.com/questions/64856424/what-are-the-differences-between-data-frame-tibble-and-matrix
fct_reorder
iris |>
ggplot(
aes(x = fct_reorder(Species, Sepal.Width), y = Sepal.Width)
) +
geom_boxplot()
– Play with dplyr
functions
– Understand join functions
– Join multiple data frames
Messy data
– The sheer volume of information is sometimes referred to as “messy” data, because it’s hard to make sense of it all.
Data merging is the process of combining two or more data sets into a single data set. Most often, this process is necessary when you have raw data stored in multiple files, worksheets, or data tables, that you want to analyze together.
– Left Join
– Inner Join
– Right Join
– Full Join
– ae-04 part 2!
– Joining Fisheries
– This is important! Data are messy!
– Think carefully about the join you use