Day 10
Dr. Elijah Meyer
Duke University
STA 199 - Summer 2023
June 8th
– Clone ae-10
– Homework 3 Due Tuesday (6-13)
– Project Proposal due Monday (6-12)
— Turn in on GitHub
— Lab today is a project work day
– What is the main difference between Simple Linear Regression and Multiple Linear Regression?
– What is the main difference between an additive model and an interaction model?
– Understand R-squared vs Adjusted R-squared
– What, why and how of logistic regression
R-squared has…
A meaningful definition
a relationship with cor
when in the SLR case
When can we use R-squared for model selection?
When models have the same number of variables
Why can’t we use it to compare models with different number of variables? See ae-09 for demonstration.
– statistical measure in a regression model that determines the proportion of variance in the response variable that can be explained by the explanatory variable(s).
– The more variables you include, the larger the R-squared value will be (always)
Adjusted R-squared …
Doesn’t have a clean definition
Is very useful for model selection
Takeaway: Adds a penalty for “unimportant” predictors (x’s)
– clone ae-10
– finish MLR (2 quantitative explanatory variables)
– The What, Why, and How of Logistic Regression
Similar to linear regression…. but
Modeling tool when our response is categorical
– This type of model is called a generalized linear model
– Bernoulli Distribution
2 outcomes: Success (p) or Failure (1-p)
\(y_i\) ~ Bern(p)
What we can do is we can use our explanatory variable(s) to model p
– 1: Define a linear model
– 2: Define a link function
\(\eta_i = \beta_o + \beta_1*X_i + ...\)
Note: \(\eta_i\) is some response of our linear model
But we can’t stop here… \(\eta_i\) isn’t the probability of success of our response
Think about what a linear model looks like
– Preform a transformation to our response variable so it has the appropriate range of values
– “Link” our linear model to the paramater of the outcome distribution
– \(y_i\) ~ Bern(p)
The logit link function is defined as follows:
\(\eta_i\) = \(log (\frac{p}{1-p})\)
– Note: log is in reference to natural log
– A logit link function transforms the probabilities of the levels of a categorical response variable to a continuous scale that is unbounded
– Note: log is in reference to natural log
Takes a [0,1] probability and maps it to log odds (-\(\infty\) to \(\infty\).)
This isn’t exactly what we need though…..
Will help us get to our goal
\(logit(p_i)\) = \(\widehat{\beta_o} +\widehat{\beta}_1X1_i + ....\)
logit(p) is also known as the log-odds
logit(p) = \(log(\frac{p}{1-p})\)
\(log(\frac{p}{1-p})\) = \(\widehat{\beta_o} +\widehat{\beta}_1X1 + ....\)
– Recall, the goal is to take values between -\(\infty\) and \(\infty\) and map them to probabilities. We need the opposite of the link function… or the inverse
– How do we take the inverse of a natural log?
We need to take the inverse of the logit function
Example Figure:
Calculate probabilities of success of a response based on values of explanatory variable x.