2.5: OLS: Precision and Diagnostics - Class Notes
Contents
Tuesday, October 1, 2019
Overview
Last class and this class we are looking at the sampling distibution of OLS estimators (particularly ^β1). Last class we looked at what the center of the distribution was - the true β1 - so long as the assumptions about u hold:
- When cor(X,u)=0, X is exogenous and the OLS estimators are unbiased.
- What cor(X,u)≠0, X is endogenous and the OLS estimators are biased.
Today we continue looking at the sampling distibution by determining the variation in ^beta1 (it’s variance or its standard errorThe square root of variance, as always!
). We look at the formula and see the three major determinants of variation in ^β1:
- Goodness of fit of the regression (SER or ^σu
- Sample size n
- Variation in X
We also look at the diagnostics of a regression by looking at its residuals (^ui) for anomalies. We focus on the problem of heteroskedasticity (where the variation in \(\hat{u_i])\) changes over the range of X, which violates assumption 2 (errors are homoskedastic): how to detect it, test it, and fix it with some packages. We also look at outliers, which can bias the regression. Finally, we also look at how to present regression results.
We continue the extended example about class sizes and test scores, which comes from a (Stata) dataset from an old textbook that I used to use, Stock and Watson, 2007. Download and follow along with the data from today’s example:Note this is a .dta
Stata file. You will need to (install and) load the package haven
to read_dta()
Stata files into a dataframe.
Slides
Practice Problems
Today you will be working on R practice problems. Check back later for solutions.
New Packages Mentioned
broom
: for tidy regression outputs, summary statistics, and adding ^Yi and ^ui into the dataframehuxtable
: to present regression output in a table withhuxreg()
lmtest
: for testing for heteroskedasticity in errors withbptest()
car
: for testing for outliers withoutlierTest()
estimatr
: for calculating robust standard errors withlm_robust()
Assignments: Problem Set 2 Answers
Problem Set 2 answers are posted.
Appendix: Robust Standard Errors in R
This, since I started using huxtable
instead of another package (stargazer
) to make regression tables, I have gone all in on estimatr
’s lm_robust()
option to calculate robust standard errors. Before this, there were some other methods that I had to resort to. You can read about that in this blog post.