Day-6
Dr. Elijah Meyer
Duke University
STA 199 - Summer 2023
May 30th, 2023
Clone your ae-06
project in RStudio
HW-2 is out. Due Thursday by 5:00
Lab-02 - Team Agreement on Gradescope
Are you in a project group?
Are you keeping up with prepare material?
– Exam 1 - June 1st
– Project is starting up!
– Find a data set
– Come up with a research question
– Perform EDA
– Perform Statistical Procedure
– Give a presentation
Can be found on the website
Identify 2 data sets you’re interested in potentially using for the final project.
– 300 observations
– 6 variables
Plenty of places to find data online
Data must be real
Identifier variables such as “name”, “social security number”, etc. are not useful explanatory variables. If you have multiple columns with the same information (e.g. “state abbreviation” and “state name”), then they are not unique explanatory variables.
You may not use data that has previously been used in any course materials, or any derivation of data that has been used in course materials.
Introduction and data
Research question
Literature
Glimpse
Repos will be coming soon for each group!
– You must submit a PDF document to Gradescope
– With the exception of major emergencies, late submissions will not be accepted. A last-minute technical issue is not a major emergency.
– Include appropriate labels, titles, etc. when making any plot.
– This is an individual assignment.
– You may post clarification questions on Slack in the #exam-1 channel.
– Don’t cheat
– You may use R documentation, as well as course materials (notes and textbooks), or existing internet resources to answer exam questions. You may not, under any circumstances, use ChatGPT on the exam. Doing so will result in an 0.
– PDF not submitted on Gradescope (-10 points): If a PDF is not uploaded to Gradescope by the submission deadline, the PDF at your latest commit prior to the deadline will be used as your submission.
– If there is no PDF in your repo, i.e., you’ve never rendered your .qmd file, your work will not be graded and you will receive a 0 on the exam.
– Pages not marked on Gradescope (-10 points)
Will cover data viz and data wrangling
Questions will be similar
Render + Commit + Push after EVERY question
Wide vs Long Data
– What’s the difference?
new.blazer <- trailblazer |>
pivot_longer(cols = !Player,
names_to = "Game",
values_to = "Points")
new.blazer |>
pivot_wider(
names_from = Game,
values_from = Points
)
Suppose a researcher wants to subset the mtcars
data set to only include cars with 4 and 6 cylinders.
Two researchers set out to subset these data using the following code. What’s different? What’s correct?
Researcher 1:
cylinders <- c("6", "4")
mtcars |>
mutate(cyl = factor(cyl)) |>
filter(cyl == cylinders)
Researcher 2:
cylinders <- c("6", "4")
mtcars |>
mutate(cyl = factor(cyl)) |>
filter(cyl %in% cylinders)
cylinders <- c("6", "4")
mtcars |>
mutate(cyl = factor(cyl)) |>
filter(cyl == cylinders)
cylinders <- c("6", "4")
mtcars |>
mutate(cyl = factor(cyl)) |>
filter(cyl %in% cylinders)
}
nrow()
and ncol()
– You can see the project you are working in in the top right corner of your screen. This MUST be the project that you cloned for the exam / assignment / lab. Do not use the files tab to go search for a file outside of your project repo.
– External viewer error?
– Did you put View() in a code chunk? We don’t use View often, but we need to be aware that any function that calls for an external viewer will break the render.
– Something wrong with your code chunk arguments?
– The YAML is the metadata that tells Quarto exactly how to process or display the document. This happens in the first few lines of the document between the tick marks.
— Does your code run? If you have errors in your code, you will also have errors when rendering the document.
Error should give you an idea about where the error is occurring.
If error can’t be found, go through question by question to find it.
– Help files (?function.name
)
– https://ggplot2.tidyverse.org/reference/index.html
– Slack
– Keys
– We will start with the debugging qmd.