1.2: The Quest for Causality

ECON 480 · Econometrics · Fall 2019

Ryan Safner
Assistant Professor of Economics
safner@hood.edu
ryansafner/metricsf19
metricsF19.classes.ryansafner.com

The 2 Big Problems with Data

Two Big Problems with Data

We want to use econometrics to identify causal relationships and make inferences about them

Problem for identification: endogeneity
Problem for inference: randomness

Identification Problem: Endogeneity I

An independent variable is exogenous if its variation is unrelated to other factors that affect the dependent variable
An independent variable is endogenous if its variation is related to other factors that affect the dependent variable

Identification Problem: Endogeneity II

An independent variable is exogenous if its variation is unrelated to other factors that affect the dependent variable

Identification Problem: Endogeneity III

An independent variable is endogenous if its variation is related to other factors that affect the dependent variable

Identification Problem: Endogeneity IV

Confusingly, these terms mean something different in econometrics than in the theoretical economics models where you have heard them before
In Theoretical models:
- "Exogenous": a parameter determined outside of the model and taken as given
- "Endogenous": a variable whose value is determined by the model

Identification Problem: Endogeneity V

Example 1: In a classic supply and demand model:

Exogenous parameters: income, prices of other goods, cost, technology
Endogenous variables: equilibrium price, equilibrium quantity

Identification Problem: Endogeneity VI

Example 2: In a consumer optimization model:

Exogenous parameters: market prices, income, utility function
Endogenous variables: utility-maximizing bundle of goods

Inference Problem: Randomness

Data is random due to natural sampling variation
- Taking one sample of a population will yield slightly different information than another sample of the same population
Common in statistics, easy to fix
Inferential Statistics: making claims about a wider population using sample data
- We use common tools and techniques to deal with randomness

Identifying Causal Effects: Random Controlled Trials

Random Controlled Trials (RCTs) I

The ideal way to demonstrate causation is through a randomized controlled trial (RCT) or "random experiment"
- Randomly assign experimental units (e.g. people, firms, etc.) into groups
- Treatment group(s) get a (kind of) treatment
- Control group gets no treatment
- Compare results of treatment and control groups to observe the average treatment effect (ATE)
We will understand "causality" to mean the ATE from an ideal RCT

Random Controlled Trials (RCTs) II

Classic (simplified) procedure of a randomized control trial (RCT) from medicine

Random Controlled Trials (RCTs) III

Random assignment to groups ensures that the only differences between members of the treatment(s) and control groups is receiving treatment or not

Treatment Group

Control Group

Random Controlled Trials (RCTs) IV

Random assignment to groups ensures that the only differences between members of the treatment(s) and control groups is receiving treatment or not
Selection bias: (pre-existing) differences between members of treatment and control groups other than treatment, that affect the outcome

Treatment Group

Control Group

(Selection Bias)

Some Theory By Example

Example: The Effect of Having Health Insurance I

Example: What is the effect of having health insurance on health outcomes?

National Health Interview Survey (NHIS) asks "Would you say your health in general is excellent, very good, good, fair, or poor?"
Outcome variable : Index of health (1-poor to 5-excellent) in a sample of married NHIS respondents in 2009 who may or may not have health insurance
Treatment : Having health insurance (vs. not)

Example: The Effect of Having Health Insurance II

Angrist, Joshua & Jorn-Steffen Pischke, 2015, Mostly Harmless Econometrics

Example: The Effect of Having Health Insurance III

: outcome variable (health index score, 1-5)
: health score of an individual
Individual has a choice, leading to one of two outcomes:
- : individual has not purchased health insurance
- : individual has purchased health insurance
: causal effect for individual of purchasing health insurance

Example: A Hypothetical Comparison

John	Maria

Example: A Hypothetical Comparison

John	Maria

John will choose to buy health insurance
Maria will choose to not buy health insurance

Example: A Hypothetical Comparison

John	Maria

John will choose to buy health insurance
Maria will choose to not buy health insurance
Health insurance improves John's score by 1, has no effect on Maria's score

Example: A Hypothetical Comparison

John	Maria

John will choose to buy health insurance
Maria will choose to not buy health insurance
Health insurance improves John's score by 1, has no effect on Maria's score
Note, all we can observe in the data are their health outcomes after they have chosen (not) to buy health insurance: and

Example: A Hypothetical Comparison

John	Maria

John will choose to buy health insurance
Maria will choose to not buy health insurance
Health insurance improves John's score by 1, has no effect on Maria's score
Note, all we can observe in the data are their health outcomes after they have chosen (not) to buy health insurance: and
Observed difference between John and Maria:

Counterfactuals

John	Maria

This is all the data we actually observe

Observed difference between John and Maria:
Recall:
- John has bought health insurance
- Maria has not bought insurance
We don't see the counterfactuals:
- John's score without insurance
- Maria score with insurance

Counterfactuals

John	Maria

This is all the data we actually observe

Observed difference between John and Maria:
Algebra trick: add and subtract to equation

: Causal effect for John of buying insurance
: Difference between John & Maria pre-treatment, "selection bias"

Example II

Selection bias: (pre-existing) differences between members of treatment and control groups other than treatment, that affect the outcome
- i.e. John and Maria start out with very different health scores before either decides to buy insurance or not ("recieve treatment" or not)

John (Treated)

Maria (Control)

Example: Thinking about the Data

Ideal (but impossible) Data

Individual	Insured	Not Insured	Diff
John	4.0	3.0	1.0
Maria	5.0	5.0	0.0
Average	4.5	4.0	0.5

Individual treatment effect (for individual ):
Average treatment effect:

Example: Thinking about the Data

Ideal (but impossible) Data

Individual	Insured	Not Insured	Diff
John	4.0	3.0	1.0
Maria	5.0	5.0	0.0
Average	4.5	4.0	0.5

Individual treatment effect (for individual ):
Average treatment effect:

Actual (observed) Data

Individual	Insured	Not Insured	Diff
John	4.0	?	?
Maria	?	5.0	?
Average	?	?	?

We never get to see each person's counterfactual state to compare and calculate ITEs or ATE

Example: Thinking about the Data

Basic comparisons tell us something about outcomes, but not ATE (the causal effect we're looking for)

Selection bias: difference in average between groups pre-treatment
includes everything about person relevant to health except treatment (insurance) status
- Age, sex, height, weight, climate, smoker, exercise, diet, etc.
- Imagine a world where nobody has insurance (treatment), who would have highest health scores?

Actual (observed) Data

Individual	Insured	Not Insured	Diff
John	4.0	?	?
Maria	?	5.0	?
Average	?	?	?

Random Assignment: The Silver Bullet

If treatment is randomly assigned for a large sample, it eliminates selection bias!
Treatment and control groups differ on average by nothing except treatment status
Creates ceterus paribus conditions in economics: groups are identical on average (holding constant age, sex, height, etc.)

Treatment Group

Control Group

Quasi-Experiments

The Quest for Causal Effects I

RCTs are considered the "gold standard" for causal claims
But society is not our laboratory (probably a good thing!)
We can rarely conduct experiments to get data

The Quest for Causal Effects II

Instead, we often rely on observational data
This data is not random!
Must take extra care in forming an identification strategy
To make good claims about causation in society, we must get clever!

Natural Experiments

Economists often resort to searching for natural experiments
Some events beyond our control occur that separate otherwise similar entities into a "treatment" group and a "control" group that we can compare
e.g. natural disasters, U.S. State laws, military draft

The First Natural Experiment

Jon Snow

The First Natural Experiment

1813-1858

John Snow utilized the first famous natural experiment to establish the foundations of epidemiology and the germ theory of disease
Water pumps with sources downstream of a sewage dump in the Thames river spread cholera while water pumps with sources upstream did not

Famous Natural Experiments

Oregon Health Insurance Experiment: Oregon used lottery to grant Medicare access to 10,000 people, showing access to Medicaid increased use of health services, lowered debt, etc. relative to those not on Medicaid
Angrist (1990) finds that lifetime earnings of (random) drafted Vietnam veterans is 15% lower than non-veterans
Card & Kreuger (1994) find that minimum wage hike in fast-food restaurants on NJ side of border had no disemployment effects relative to restaurants on PA side of border during the same period
Acemoglu, Johnson, and Robinson (2001) find that inclusive institutions lead to higher economic development than extractive institutions, determined by a colony's disease environment in 1500
We will look at some of these in greater detail throughout the course
A great list, with explanations is here

↑, ←, Pg Up, k	Go to previous slide
↓, →, Pg Dn, Space, j	Go to next slide
Home	Go to first slide
End	Go to last slide
Number + Return	Go to specific slide
b / m / f	Toggle blackout / mirrored / fullscreen mode
c	Clone slideshow
p	Toggle presenter mode
t	Restart the presentation timer
?, h	Toggle this help

1.2: The Quest for Causality

ECON 480 · Econometrics · Fall 2019

Ryan Safner Assistant Professor of Economics safner@hood.edu ryansafner/metricsf19 metricsF19.classes.ryansafner.com

The 2 Big Problems with Data

Two Big Problems with Data

Identification Problem: Endogeneity I

Identification Problem: Endogeneity II

Identification Problem: Endogeneity III

Identification Problem: Endogeneity IV

Identification Problem: Endogeneity V

Identification Problem: Endogeneity VI

Inference Problem: Randomness

Identifying Causal Effects: Random Controlled Trials

Random Controlled Trials (RCTs) I

Random Controlled Trials (RCTs) II

Random Controlled Trials (RCTs) III

Random Controlled Trials (RCTs) IV

Some Theory By Example

Example: The Effect of Having Health Insurance I

Example: The Effect of Having Health Insurance II

Example: The Effect of Having Health Insurance III

Example: A Hypothetical Comparison

Example: A Hypothetical Comparison

Example: A Hypothetical Comparison

Example: A Hypothetical Comparison

Example: A Hypothetical Comparison

Counterfactuals

Counterfactuals

Example II

Example: Thinking about the Data

Example: Thinking about the Data

Example: Thinking about the Data

Random Assignment: The Silver Bullet

Quasi-Experiments

The Quest for Causal Effects I

The Quest for Causal Effects II

Natural Experiments

The First Natural Experiment

The First Natural Experiment

Famous Natural Experiments

For the Next Few Classes

The 2 Big Problems with Data

Help

Ryan Safner
Assistant Professor of Economics
safner@hood.edu
ryansafner/metricsf19
metricsF19.classes.ryansafner.com