Problem for identification: endogeneity
Problem for inference: randomness
An independent variable (X) is exogenous if its variation is unrelated to other factors that affect the dependent variable (Y)
An independent variable (X) is endogenous if its variation is related to other factors that affect the dependent variable (Y)
Confusingly, these terms mean something different in econometrics than in the theoretical economics models where you have heard them before
In Theoretical models:
Example 1: In a classic supply and demand model:
Example 2: In a consumer optimization model:
Data is random due to natural sampling variation
Common in statistics, easy to fix
Inferential Statistics: making claims about a wider population using sample data
Classic (simplified) procedure of a randomized control trial (RCT) from medicine
Treatment Group
Control Group
Random assignment to groups ensures that the only differences between members of the treatment(s) and control groups is receiving treatment or not
Selection bias: (pre-existing) differences between members of treatment and control groups other than treatment, that affect the outcome
Treatment Group
Control Group
(Selection Bias)
Example: What is the effect of having health insurance on health outcomes?
National Health Interview Survey (NHIS) asks "Would you say your health in general is excellent, very good, good, fair, or poor?"
Outcome variable (Y): Index of health (1-poor to 5-excellent) in a sample of married NHIS respondents in 2009 who may or may not have health insurance
Treatment (X): Having health insurance (vs. not)
Angrist, Joshua & Jorn-Steffen Pischke, 2015, Mostly Harmless Econometrics
Y: outcome variable (health index score, 1-5)
Yi: health score of an individual i
Individual i has a choice, leading to one of two outcomes:
Y0i: individual i has not purchased health insurance
Y1i: individual i has purchased health insurance
Y1i−Y0i: causal effect for individual i of purchasing health insurance
John | Maria |
---|---|
![]() |
![]() |
Y0J=3 | Y0M=5 |
Y1J=4 | Y1M=5 |
John | Maria |
---|---|
![]() |
![]() |
Y0J=3 | Y0M=5 |
Y1J=4 | Y1M=5 |
John will choose to buy health insurance
Maria will choose to not buy health insurance
John | Maria |
---|---|
![]() |
![]() |
Y0J=3 | Y0M=5 |
Y1J=4 | Y1M=5 |
Y1J−Y0J=1 | Y1M−Y0M=0 |
John will choose to buy health insurance
Maria will choose to not buy health insurance
Health insurance improves John's score by 1, has no effect on Maria's score
John | Maria |
---|---|
![]() |
![]() |
Y0J=3 | Y0M=5 |
Y1J=4 | Y1M=5 |
Y1J−Y0J=1 | Y1M−Y0M=0 |
YJ=(Y1J)=4 | YM=(Y0M)=5 |
John will choose to buy health insurance
Maria will choose to not buy health insurance
Health insurance improves John's score by 1, has no effect on Maria's score
Note, all we can observe in the data are their health outcomes after they have chosen (not) to buy health insurance: YJ=4 and YM=5
John | Maria |
---|---|
![]() |
![]() |
Y0J=3 | Y0M=5 |
Y1J=4 | Y1M=5 |
Y1J−Y0J=1 | Y1M−Y0M=0 |
YJ=(Y1J)=4 | YM=(Y0M)=5 |
John will choose to buy health insurance
Maria will choose to not buy health insurance
Health insurance improves John's score by 1, has no effect on Maria's score
Note, all we can observe in the data are their health outcomes after they have chosen (not) to buy health insurance: YJ=4 and YM=5
Observed difference between John and Maria: YJ−YM=−1
John | Maria |
---|---|
![]() |
![]() |
YJ=4 | YM=5 |
This is all the data we actually observe
Observed difference between John and Maria: YJ−YM=Y1J−Y0M⏟=−1
Recall:
We don't see the counterfactuals:
John | Maria |
---|---|
![]() |
![]() |
YJ=4 | YM=5 |
This is all the data we actually observe
Observed difference between John and Maria: YJ−YM=Y1J−Y0M⏟=−1
Algebra trick: add and subtract Y0J to equation
Yj−YM=Y1J−Y0J⏟=1+Y0J−Y0M⏟=−2
Y0J−Y0M≠0
John (Treated)
Maria (Control)
Ideal (but impossible) Data
Individual | Insured | Not Insured | Diff |
---|---|---|---|
John | 4.0 | 3.0 | 1.0 |
Maria | 5.0 | 5.0 | 0.0 |
Average | 4.5 | 4.0 | 0.5 |
Ideal (but impossible) Data
Individual | Insured | Not Insured | Diff |
---|---|---|---|
John | 4.0 | 3.0 | 1.0 |
Maria | 5.0 | 5.0 | 0.0 |
Average | 4.5 | 4.0 | 0.5 |
Actual (observed) Data
Individual | Insured | Not Insured | Diff |
---|---|---|---|
John | 4.0 | ? | ? |
Maria | ? | 5.0 | ? |
Average | ? | ? | ? |
Difference in Group Outcomes=ATE+Selection Bias
Selection bias: difference in average Y0i between groups pre-treatment
Y0i includes everything about person i relevant to health except treatment (insurance) status
Actual (observed) Data
Individual | Insured | Not Insured | Diff |
---|---|---|---|
John | 4.0 | ? | ? |
Maria | ? | 5.0 | ? |
Average | ? | ? | ? |
If treatment is randomly assigned for a large sample, it eliminates selection bias!
Treatment and control groups differ on average by nothing except treatment status
Creates ceterus paribus conditions in economics: groups are identical on average (holding constant age, sex, height, etc.)
Treatment Group
Control Group
RCTs are considered the "gold standard" for causal claims
But society is not our laboratory (probably a good thing!)
We can rarely conduct experiments to get data
Instead, we often rely on observational data
This data is not random!
Must take extra care in forming an identification strategy
To make good claims about causation in society, we must get clever!
Economists often resort to searching for natural experiments
Some events beyond our control occur that separate otherwise similar entities into a "treatment" group and a "control" group that we can compare
e.g. natural disasters, U.S. State laws, military draft
Jon Snow
1813-1858
John Snow utilized the first famous natural experiment to establish the foundations of epidemiology and the germ theory of disease
Water pumps with sources downstream of a sewage dump in the Thames river spread cholera while water pumps with sources upstream did not
Keyboard shortcuts
↑, ←, Pg Up, k | Go to previous slide |
↓, →, Pg Dn, Space, j | Go to next slide |
Home | Go to first slide |
End | Go to last slide |
Number + Return | Go to specific slide |
b / m / f | Toggle blackout / mirrored / fullscreen mode |
c | Clone slideshow |
p | Toggle presenter mode |
t | Restart the presentation timer |
?, h | Toggle this help |
Esc | Back to slideshow |