Day 9
Dr. Elijah Meyer
Duke University
STA 199 - Summer 2023
June 6th 2023
– Clone `ae-09
– Homework 3 Release Today
– Lab 4 due Thursday (11:59)
— 1 submission. Attach everyone to it
– Team Submission
– Attach ALL team members to submission on Gradescope
– Communicate!
– “It was my responsibility to turn the lab in and I forgot….”
– Proper notation?
– Definition?
– Bounds?
Challenge your friends to a game of guess the correlation
Last time, we fit a model that investigated the effect of Island on a penguin’s body mass. The results are below:
Let’s write out the estimated model together as a class…
Can find this with the cor
or correlate
function in R
https://www.tidyverse.org/blog/2020/12/corrr-0-4-3/
Define each of these terms below:
\(\beta_0\)
\(\beta_1\)
\(b_0\)
\(b_1\)
\(\epsilon\)
\(\hat{y}\)
estimates the relationship between a quantitative response variable and two or more explanatory variables
motivated by scenarios where many variables may be simultaneously connected to an output
– Holding all else constant
Because we estimate our coefficients all at the same time. Thus, when we add variables to the model, our estimates will change!
– Linear algebra is driving what we do in R
– Least squares estimator: estimating parameters by minimizing the squared discrepancies between observed data
– \[\hat{\beta} = (X^t X)^{-1} X^tY\]
In words….
The relationship between x and y do not change based on the values of z (additive)
The relationship between x and y DO change based on the values of z (interaction)
for a statistical model states that: a simpler model with fewer parameters is favored over more complex models with more parameters, provided the models fit the data similarly well
KEEP IT SIMPLE (when you can)
Many different ways
– Initial visual evidence
– R-squared & Adjusted R-squared
– statistical measure in a regression model that determines the proportion of variance in the response variable that can be explained by the explanatory variable(s).
– statistical measure in a regression model that determines the proportion of variance in the response variable that can be explained by the explanatory variable(s).
– The more variables you include, the larger the R-squared value will be (always)
Takeaway: Adds a penalty for “unimportant” predictors (x’s)