^β1∼N(E[^β1],σ^β1)
^β1∼N(E[^β1],σ^β1)
E[^β1]; the center of the distribution (last class)
σ^β1; how precise is our estimate? (today)
1 Under the 4 assumptions about u (particularly, cor(X,u)=0).
+ Standard "error" is the analog of standard deviation when talking about
the sampling distribution of a sample statistic (such as ˉX or ^β1).
var(^β1)=(SER)2n×var(X)
se(^β1)=√var(^β1)=SER√n×sd(X)
+ Recall from last class, the Standard Error of the Regression ^σu=√∑^ui2n−2
summary(school_reg) # get full summary
## ## Call:## lm(formula = testscr ~ str, data = CASchool)## ## Residuals:## Min 1Q Median 3Q Max ## -47.727 -14.251 0.483 12.822 48.540 ## ## Coefficients:## Estimate Std. Error t value Pr(>|t|) ## (Intercept) 698.9330 9.4675 73.825 < 2e-16 ***## str -2.2798 0.4798 -4.751 2.78e-06 ***## ---## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1## ## Residual standard error: 18.58 on 418 degrees of freedom## Multiple R-squared: 0.05124, Adjusted R-squared: 0.04897 ## F-statistic: 22.58 on 1 and 418 DF, p-value: 2.783e-06
broom
's tidy()
gives us a little bit neater tablelibrary(broom)tidy(school_reg)
ABCDEFGHIJ0123456789 |
term <chr> | estimate <dbl> | std.error <dbl> | statistic <dbl> | p.value <dbl> |
---|---|---|---|---|
(Intercept) | 698.932952 | 9.4674914 | 73.824514 | 6.569925e-242 |
str | -2.279808 | 0.4798256 | -4.751327 | 2.783307e-06 |
broom
's glance()
gives us summary statistics about the regressionlibrary(broom)glance(school_reg)
ABCDEFGHIJ0123456789 |
r.squared <dbl> | adj.r.squared <dbl> | sigma <dbl> | statistic <dbl> | p.value <dbl> | df <int> | logLik <dbl> | AIC <dbl> | BIC <dbl> | |
---|---|---|---|---|---|---|---|---|---|
0.0512401 | 0.04897033 | 18.58097 | 22.57511 | 2.783307e-06 | 2 | -1822.25 | 3650.499 | 3662.62 |
Professional journals and papers often have a regression table, including:
Later: multiple columns for multiple models
Test Score | |
Intercept | 698.93 *** |
(9.47) | |
STR | -2.28 *** |
(0.48) | |
N | 420 |
R-Squared | 0.05 |
SER | 18.58 |
*** p < 0.001; ** p < 0.01; * p < 0.05. |
You will need to first install.packages("huxtable")
Load with library(huxtable)
Command: huxreg()
Main argument is the name of your lm
object
Default output is fine, but often we want to customize a bit
# install.packages("huxtable")library(huxtable)huxreg(school_reg)
(1) | |
(Intercept) | 698.933 *** |
(9.467) | |
str | -2.280 *** |
(0.480) | |
N | 420 |
R2 | 0.051 |
logLik | -1822.250 |
AIC | 3650.499 |
*** p < 0.001; ** p < 0.01; * p < 0.05. |
"Test Score" = school_reg
"Test Score" = school_reg
coefs = c("Intercept" = "(Intercept)", "STR" = "str")
"Test Score" = school_reg
coefs = c("Intercept" = "(Intercept)", "STR" = "str")
statistics = c("N" = "nobs", "R-Squared" = "r.squared", "SER" = "sigma")
"Test Score" = school_reg
coefs = c("Intercept" = "(Intercept)", "STR" = "str")
statistics = c("N" = "nobs", "R-Squared" = "r.squared", "SER" = "sigma")
number_format = 2
huxreg("Test Score" = school_reg, coefs = c("Intercept" = "(Intercept)", "STR" = "str"), statistics = c("N" = "nobs", "R-Squared" = "r.squared", "SER" = "sigma"), number_format = 2)
huxreg("Test Score" = school_reg, coefs = c("Intercept" = "(Intercept)", "STR" = "str"), statistics = c("N" = "nobs", "R-Squared" = "r.squared", "SER" = "sigma"), number_format = 2)
Test Score | |
Intercept | 698.93 *** |
(9.47) | |
STR | -2.28 *** |
(0.48) | |
N | 420 |
R-Squared | 0.05 |
SER | 18.58 |
*** p < 0.001; ** p < 0.01; * p < 0.05. |
huxtable
is one package you can use
I used to only use stargazer
, but as it was originally meant for STATA, it has limits and problems
We often look at the residuals of a regression to get more insight about its goodness of fit and its bias
Recall broom
's augment
creates some useful new variables
.fitted
are fitted (predicted) values from model, i.e. ˆYi.resid
are residuals (errors) from model, i.e. ˆuiaug_reg<-augment(school_reg)aug_reg %>% head()
testscr | str | .fitted | .se.fit | .resid | .hat | .sigma | .cooksd | .std.resid |
691 | 17.9 | 658 | 1.24 | 32.7 | 0.00442 | 18.5 | 0.00689 | 1.76 |
661 | 21.5 | 650 | 1.28 | 11.3 | 0.00475 | 18.6 | 0.000893 | 0.612 |
644 | 18.7 | 656 | 1.01 | -12.7 | 0.00297 | 18.6 | 0.0007 | -0.685 |
648 | 17.4 | 659 | 1.42 | -11.7 | 0.00586 | 18.6 | 0.00117 | -0.629 |
641 | 18.7 | 656 | 1.02 | -15.5 | 0.00301 | 18.6 | 0.00105 | -0.836 |
606 | 21.4 | 650 | 1.24 | -44.6 | 0.00446 | 18.5 | 0.013 | -2.4 |
aug_reg<-augment(school_reg)aug_reg %>% head()
testscr | str | .fitted | .se.fit | .resid | .hat | .sigma | .cooksd | .std.resid |
691 | 17.9 | 658 | 1.24 | 32.7 | 0.00442 | 18.5 | 0.00689 | 1.76 |
661 | 21.5 | 650 | 1.28 | 11.3 | 0.00475 | 18.6 | 0.000893 | 0.612 |
644 | 18.7 | 656 | 1.01 | -12.7 | 0.00297 | 18.6 | 0.0007 | -0.685 |
648 | 17.4 | 659 | 1.42 | -11.7 | 0.00586 | 18.6 | 0.00117 | -0.629 |
641 | 18.7 | 656 | 1.02 | -15.5 | 0.00301 | 18.6 | 0.00105 | -0.836 |
606 | 21.4 | 650 | 1.24 | -44.6 | 0.00446 | 18.5 | 0.013 | -2.4 |
The expected value of the residuals is 0 E[u]=0
The variance of the residuals over X is constant, written: var(u|X)=σ2u
Errors are not correlated across observations: cor(ui,uj)=0∀i≠j
No correlation between X and the error term: cor(X,u)=0 or E[u|X]=0
Assumptions 1 and 2 assume that errors are coming from the same (normal) distribution u∼N(0,σu)
We often can visually check by plotting a histogram of u
ggplot(data = aug_reg)+ aes(x = .resid)+ geom_histogram(color="white")+ labs(x = expression(paste("Residual, ", hat(u))))+ theme_bw(base_family = "Fira Sans Condensed", base_size=20)
ggplot(data = aug_reg)+ aes(x = .resid)+ geom_histogram(color="white")+ labs(x = expression(paste("Residual, ", hat(u))))+ theme_bw(base_family = "Fira Sans Condensed", base_size=20)
ggplot(data = aug_reg)+ aes(x = .resid)+ geom_histogram(color="white")+ labs(x = expression(paste("Residual, ", hat(u))))+ theme_bw(base_family = "Fira Sans Condensed", base_size=20)
aug_reg %>% summarize(E_u = mean(.resid), sd_u = sd(.resid))
E_u | sd_u |
-5.76e-16 | 18.6 |
str
).resid
)str
).resid
)"Homoskedasticity:" variance of the residuals over X is constant, written: var(u|X)=σ2u
Knowing the value of X does not affect the variance (spread) of the errors
"Heteroskedasticity:" variance of the residuals over X is NOT constant: var(u|X)≠σ2u
This does not cause ^β1 to be biased, but it does cause the standard error of ^β1 to be incorrect
This does cause a problem for inference!
se(^β1)=√var(^β1)=SER√n×sd(X)
se(^β1)=√n∑i=1(Xi−ˉX)2ˆu2[n∑i=1(Xi−ˉX)2]2
This is a heteroskedasticity-robust (or just "robust") method of calculating se(^β1)
Don't learn formula, do learn what heteroskedasticity is and how it affects our model!
Our original scatterplot with regression line
Does the spread of the errors change over different values of str?
Our original scatterplot with regression line
Does the spread of the errors change over different values of str?
^wagei=^β0+^β1educi
Wage | |
Intercept | -0.90 |
(0.68) | |
Years of Schooling | 0.54 *** |
(0.05) | |
N | 526 |
R-Squared | 0.16 |
SER | 3.38 |
*** p < 0.001; ** p < 0.01; * p < 0.05. |
^wagei=^β0+^β1educi
Wage | |
Intercept | -0.90 |
(0.68) | |
Years of Schooling | 0.54 *** |
(0.05) | |
N | 526 |
R-Squared | 0.16 |
SER | 3.38 |
*** p < 0.001; ** p < 0.01; * p < 0.05. |
bptest()
with lmtest
package in R
bptest()
with lmtest
package in R
# install.packages("lmtest")library("lmtest")bptest(school_reg)
## ## studentized Breusch-Pagan test## ## data: school_reg## BP = 5.7936, df = 1, p-value = 0.01608
# install.packages("lmtest")library("lmtest")bptest(wage_reg)
## ## studentized Breusch-Pagan test## ## data: wage_reg## BP = 15.306, df = 1, p-value = 9.144e-05
Heteroskedasticity is easy to fix with software that can calculate robust standard errors (using the more complicated formula above)
Easiest method is to use estimatr
package
lm_robust()
command (instead of lm
) to run regressionse_type="stata"
to calculate robust SEs using the formula above#install.packages("estimatr")library(estimatr)school_reg_robust <-lm_robust(testscr ~ str, data = CASchool, se_type = "stata")school_reg_robust
## Estimate Std. Error t value Pr(>|t|) CI Lower## (Intercept) 698.932952 10.3643599 67.436191 9.486678e-227 678.560192## str -2.279808 0.5194892 -4.388557 1.446737e-05 -3.300945## CI Upper DF## (Intercept) 719.305713 418## str -1.258671 418
library(huxtable)huxreg("Normal" = school_reg, "Robust" = school_reg_robust, coefs = c("Intercept" = "(Intercept)", "STR" = "str"), statistics = c("N" = "nobs", "R-Squared" = "r.squared", "SER" = "sigma"), number_format = 2)
Normal | Robust | |
Intercept | 698.93 *** | 698.93 *** |
(9.47) | (10.36) | |
STR | -2.28 *** | -2.28 *** |
(0.48) | (0.52) | |
N | 420 | 420 |
R-Squared | 0.05 | 0.05 |
SER | 18.58 | |
*** p < 0.001; ** p < 0.01; * p < 0.05. |
Errors are not correlated across observations: cor(ui,uj)=0∀i≠j
For simple cross-sectional data, this is rarely an issue
Time-series & panel data nearly always contain serial correlation or autocorrelation between errors
Errors may be clustered
We'll deal with these fixes when we talk about panel data (or time-series if necessary)
Outliers can affect the slope (and intercept) of the line and add bias
In any case, compare how including/dropping outliers affects regression and always discuss outliers!
No Outliers | Outliers | |
Intercept | 698.93 *** | 641.40 *** |
(9.47) | (11.21) | |
STR | -2.28 *** | 0.71 |
(0.48) | (0.57) | |
N | 420 | 423 |
R-Squared | 0.05 | 0.00 |
SER | 18.58 | 23.76 |
*** p < 0.001; ** p < 0.01; * p < 0.05. |
car
package has an outlierTest
command to run on the regressionlibrary("car")# Use Bonferonni test outlierTest(school_outlier_reg) # will point out which obs #s seem outliers
## rstudent unadjusted p-value Bonferroni p## 422 8.822768 3.0261e-17 1.2800e-14## 423 7.233470 2.2493e-12 9.5147e-10## 421 6.232045 1.1209e-09 4.7414e-07
^β1∼N(E[^β1],σ^β1)
Keyboard shortcuts
↑, ←, Pg Up, k | Go to previous slide |
↓, →, Pg Dn, Space, j | Go to next slide |
Home | Go to first slide |
End | Go to last slide |
Number + Return | Go to specific slide |
b / m / f | Toggle blackout / mirrored / fullscreen mode |
c | Clone slideshow |
p | Toggle presenter mode |
t | Restart the presentation timer |
?, h | Toggle this help |
Esc | Back to slideshow |