2.4: OLS: Goodness of Fit and Bias - Class Notes
Contents
Tuesday, September 24, 2019
Overview
Today we continue looking at basic OLS regression. We will cover how to measure if a regression line is a good fit (using R2 and σu or SER), and whether OLS estimators are biased. These will depend on four critical assumptions about u.
In doing so, we begin an ongoing exploration into inferential statistics, which will finally become clear in another week. The most confusing part is recognizing that there is a sampling distribution of each OLS estimator. We want to measure the center of that sampling distribution, to see if the estimator is biased. Next class we will measure the spread of that distribution.
We continue the extended example about class sizes and test scores, which comes from a (Stata) dataset from an old textbook that I used to use, Stock and Watson, 2007. Download and follow along with the data from today’s example:Note this is a .dta
Stata file. You will need to (install and) load the package haven
to read_dta()
Stata files into a dataframe.
Slides
Assignments: Problem Set 2 DUE
Problem Set 2 is DUE. Answers will be posted later today.
Appendix: OLS Estimators
Deriving the OLS Estimators
The population linear regression model is:
Yi=β0+β1Xi+ui
The errors (ui) are unobserved, but for candidate values of ^β0 and ^β1, we can obtain an estimate of the residual. Algebraically, the error is:
^ui=Yi−^β0−^β1Xi
Recall our goal is to find ^β0 and ^β1 that minimizes the sum of squared errors (SSE):
SSE=n∑i=1^ui2
So our minimization problem is:
min^β0,^β1n∑i=1(Yi−^β0−^β1Xi)2
Using calculus, we take the partial derivatives and set it equal to 0 to find a minimum. The first order conditions are:
∂SSE∂^β0=−2n∑i=1(Yi−^β0−^β1Xi)=0∂SSE∂^β1=−2n∑i=1(Yi−^β0−^β1Xi)Xi=0
Finding ^β0
Working with the first FOC, divide both sides by −2:
n∑i=1(Yi−^β0−^β1Xi)=0
Then expand the summation across all terms and divide by n:
1nn∑i=1Yi⏟ˉY−1nn∑i=1^β0⏟^β0−1nn∑i=1^β1Xi⏟^β1ˉX=0
Note the first term is ˉY, the second is ^β0, the third is ^β1ˉX.From the rules about summation operators, we define the mean of a random variable X as ˉX=1nn∑i=1Xi. The mean of a constant, like β0 or β1 is itself.
So we can rewrite as:
ˉY−^β0−β1=0
Rearranging:
^β0=ˉY−ˉXβ1
Finding ^β1
To find ^β1, take the second FOC and divide by −2:
n∑i=1(Yi−^β0−^β1Xi)Xi=0
From the formula for ^β0, substitute in for ^β0:
n∑i=1(Yi−[ˉY−^β1ˉX]−^β1Xi)Xi=0
Combining similar terms:
n∑i=1([Yi−ˉY]−[Xi−ˉX]^β1)Xi=0
Distribute Xi and expand terms into the subtraction of two sums (and pull out ^β1 as a constant in the second sum:
n∑i=1[Yi−ˉY]Xi−^β1n∑i=1[Xi−ˉX]Xi=0
Move the second term to the righthand side:
n∑i=1[Yi−ˉY]Xi=^β1n∑i=1[Xi−ˉX]Xi
Divide to keep just ^β1 on the right:
n∑i=1[Yi−ˉY]Xin∑i=1[Xi−ˉX]Xi=^β1
Note that from the rules about summation operators:
n∑i=1[Yi−ˉY]Xi=n∑i=1(Yi−ˉY)(Xi−ˉX)
and:
n∑i=1[Xi−ˉX]Xi=n∑i=1(Xi−ˉX)(Xi−ˉX)=n∑i=1(Xi−ˉX)2
Plug in these two facts:
n∑i=1(Yi−ˉY)(Xi−ˉX)n∑i=1(Xi−ˉX)2=^β1
Algebraic Properties of OLS Estimators
The OLS residuals ˆu and predicted values ˆY are chosen by the minimization problem to satisfy:
The expected value (average) error is 0: E(ui)=1nn∑i=1^ui=0
The covariance between X and the errors is 0: ˆσX,u=0
Note the first two properties imply strict exogeneity. That is, this is only a valid model if X and u are not correlated.
The expected predicted value of Y is equal to the expected value of Y: ˉˆY=1nn∑i=1^Yi=ˉY
Total sum of squares is equal to the explained sum of squares plus sum of squared errors: TSS=ESS+SSEn∑i=1(Yi−ˉY)2=n∑i=1(^Yi−ˉY)2+n∑i=1u2
Recall R2 is ESSTSS or 1−SSE
- The regression line passes through the point (ˉX,ˉY), i.e. the mean of X and the mean of Y.
Bias in ^β1
Begin with the formula we derived for ^β1:
^β1=n∑i=1(Yi−ˉY)(Xi−ˉX)n∑i=1(Xi−ˉX)2
Recall from Rule 6 of summations, we can rewrite the numerator as
=n∑i=1(Yi−ˉY)(Xi−ˉX)=n∑i=1Yi(Xi−ˉX)
^β1=n∑i=1Yi(Xi−ˉX)n∑i=1(Xi−ˉX)2
We know the true population relationship is expressed as:
Yi=β0+β1Xi+ui
Substituting this in for Yi in equation 2:
^β1=n∑i=1(β0+β1Xi+ui)(Xi−ˉX)n∑i=1(Xi−ˉX)2
^β1=n∑i=1β0(Xi−ˉX)+n∑i=1β1Xi(Xi−ˉX)+n∑i=1ui(Xi−ˉX)n∑i=1(Xi−ˉX)2
We can simplify equation 4 using Rules 4 and 5 of summations
- The first term in the numerator [n∑i=1β0(Xi−ˉX)] has the constant β0, which can be pulled out of the summation. This gives us the summation of deviations, which add up to 0 as per Rule 4:
n∑i=1β0(Xi−ˉX)=β0n∑i=1(Xi−ˉX)=β0(0)=0
- The second term in the numerator [n∑i=1β1Xi(Xi−ˉX)] has the constant β1, which can be pulled out of the summation. Additionally, Rule 5 tells us n∑i=1Xi(Xi−ˉX)=n∑i=1(Xi−ˉX)2:
n∑i=1β1X1(Xi−ˉX)=β1n∑i=1Xi(Xi−ˉX)=β1n∑i=1(Xi−ˉX)2
When placed back in the context of being the numerator of a fraction, we can see this term simplifies to just β1:
β1n∑i=1(Xi−ˉX)2n∑i=1(Xi−ˉX)2=β11×n∑i=1(Xi−ˉX)2n∑i=1(Xi−ˉX)2=β1
Thus, we are left with:
^β1=β1+n∑i=1ui(Xi−ˉX)n∑i=1(Xi−ˉX)2
Now, take the expectation of both sides:
E[^β1]=E[β1+n∑i=1ui(Xi−ˉX)n∑i=1(Xi−ˉX)2]
We can break this up, using properties of expectations. First, recall E[a+b]=E[a]+E[b], so we can break apart the two terms.
E[^β1]=E[β1]+E[n∑i=1ui(Xi−ˉX)n∑i=1(Xi−ˉX)2]
Second, the true population value of β1 is a constant, so E[β1]=β1.
Third, since we assume X is also “fixed” and not random, the variance of X, n∑i=1(Xi−ˉX), in the denominator, is just a constant, and can be brought outside the expectation.
E[^β1]=β1+E[n∑i=1ui(Xi−ˉX)]n∑i=1(Xi−ˉX)2
Thus, the properties of the equation are primarily driven by the expectation E[n∑i=1ui(Xi−ˉX)]. We now turn to this term.
Use the property of summation operators to expand the numerator term:
^β1=β1+n∑i=1ui(Xi−ˉX)n∑i=1(Xi−ˉX)2^β1=β1+n∑i=1(ui−ˉu)(Xi−ˉX)n∑i=1(Xi−ˉX)2
Now divide the numerator and denominator of the second term by 1n. Realize this gives us the covariance between X and u in the numerator and variance of X in the denominator, based on their respective definitions.
^β1=β1+1nn∑i=1(ui−ˉu)(Xi−ˉX)1nn∑i=1(Xi−ˉX)2^β1=β1+cov(X,u)var(X)^β1=β1+sX,us2X
By the Zero Conditional Mean assumption of OLS, sX,u=0.
Alternatively, we can express the bias in terms of correlation instead of covariance:
E[^β1]=β1+cov(X,u)var(X)
From the definition of correlation:
cor(X,u)=cov(X,u)sXsucor(X,u)sXsu=cov(X,u)
Plugging this in:
E[^β1]=β1+cov(X,u)var(X)E[^β1]=β1+[cor(X,u)sxsu]s2XE[^β1]=β1+cor(X,u)susXE[^β1]=β1+cor(X,u)susX
Proof of the Unbiasedness of ^β1
Begin with equation:Admittedly, this is a simplified version where ^β0=0, but there is no loss of generality in the results.
^β1=∑YiXi∑X2i
Substitute for Yi:
^β1=∑(β1Xi+ui)Xi∑X2i
Distribute Xi in the numerator:
^β1=∑β1X2i+uiXi∑X2i
Separate the sum into additive pieces:
^β1=∑β1X2i∑X2i+uiXi∑X2i
β1 is constant, so we can pull it out of the first sum:
^β1=β1∑X2i∑X2i+uiXi∑X2i
Simplifying the first term, we are left with:
^β1=β1+uiXi∑X2i
Now if we take expectations of both sides:
E[^β1]=E[β1]+E[uiXi∑X2i]
β1 is a constant, so the expectation of β1 is itself.
E[^β1]=β1+E[uiXi∑X2i]
Using the properties of expectations, we can pull out 1∑X2i as a constant:
E[^β1]=β1+1∑X2iE[∑uiXi]
Again using the properties of expectations, we can put the expectation inside the summation operator (the expectation of a sum is the sum of expectations):
E[^β1]=β1+1∑X2i∑E[uiXi]
Under the exogeneity condition, the correlation between Xi and ui is 0.
E[^β1]=β1