+ - 0:00:00
Notes for current slide
Notes for next slide

1.2: The Quest for Causality

ECON 480 · Econometrics · Fall 2019

Ryan Safner
Assistant Professor of Economics
safner@hood.edu
ryansafner/metricsf19
metricsF19.classes.ryansafner.com

The 2 Big Problems with Data

Two Big Problems with Data

  • We want to use econometrics to identify causal relationships and make inferences about them
  1. Problem for identification: endogeneity

  2. Problem for inference: randomness

Identification Problem: Endogeneity I

  • An independent variable (X) is exogenous if its variation is unrelated to other factors that affect the dependent variable (Y)

  • An independent variable (X) is endogenous if its variation is related to other factors that affect the dependent variable (Y)

Identification Problem: Endogeneity II

  • An independent variable (X) is exogenous if its variation is unrelated to other factors that affect the dependent variable (Y)

Identification Problem: Endogeneity III

  • An independent variable (X) is endogenous if its variation is related to other factors that affect the dependent variable (Y)

Identification Problem: Endogeneity IV

  • Confusingly, these terms mean something different in econometrics than in the theoretical economics models where you have heard them before

  • In Theoretical models:

    • "Exogenous": a parameter determined outside of the model and taken as given
    • "Endogenous": a variable whose value is determined by the model

Identification Problem: Endogeneity V

Example 1: In a classic supply and demand model:

  • Exogenous parameters: income, prices of other goods, cost, technology
  • Endogenous variables: equilibrium price, equilibrium quantity

Identification Problem: Endogeneity VI

Example 2: In a consumer optimization model:

  • Exogenous parameters: market prices, income, utility function
  • Endogenous variables: utility-maximizing bundle of goods

Inference Problem: Randomness

  • Data is random due to natural sampling variation

    • Taking one sample of a population will yield slightly different information than another sample of the same population
  • Common in statistics, easy to fix

  • Inferential Statistics: making claims about a wider population using sample data

    • We use common tools and techniques to deal with randomness

Identifying Causal Effects: Random Controlled Trials

Random Controlled Trials (RCTs) I

  • The ideal way to demonstrate causation is through a randomized controlled trial (RCT) or "random experiment"
    • Randomly assign experimental units (e.g. people, firms, etc.) into groups
    • Treatment group(s) get a (kind of) treatment
    • Control group gets no treatment
    • Compare results of treatment and control groups to observe the average treatment effect (ATE)
  • We will understand "causality" to mean the ATE from an ideal RCT

Random Controlled Trials (RCTs) II

Classic (simplified) procedure of a randomized control trial (RCT) from medicine

Random Controlled Trials (RCTs) III

  • Random assignment to groups ensures that the only differences between members of the treatment(s) and control groups is receiving treatment or not

Treatment Group

Control Group

Random Controlled Trials (RCTs) IV

  • Random assignment to groups ensures that the only differences between members of the treatment(s) and control groups is receiving treatment or not

  • Selection bias: (pre-existing) differences between members of treatment and control groups other than treatment, that affect the outcome

Treatment Group

Control Group

(Selection Bias)

Some Theory By Example

Example: The Effect of Having Health Insurance I

Example: What is the effect of having health insurance on health outcomes?

  • National Health Interview Survey (NHIS) asks "Would you say your health in general is excellent, very good, good, fair, or poor?"

  • Outcome variable (Y): Index of health (1-poor to 5-excellent) in a sample of married NHIS respondents in 2009 who may or may not have health insurance

  • Treatment (X): Having health insurance (vs. not)

Example: The Effect of Having Health Insurance II

Angrist, Joshua & Jorn-Steffen Pischke, 2015, Mostly Harmless Econometrics

Example: The Effect of Having Health Insurance III

  • Y: outcome variable (health index score, 1-5)

  • Yi: health score of an individual i

  • Individual i has a choice, leading to one of two outcomes:

    • Y0i: individual i has not purchased health insurance

    • Y1i: individual i has purchased health insurance

  • Y1iY0i: causal effect for individual i of purchasing health insurance

Example: A Hypothetical Comparison

John Maria
Y0J=3 Y0M=5
Y1J=4 Y1M=5

Example: A Hypothetical Comparison

John Maria
Y0J=3 Y0M=5
Y1J=4 Y1M=5
  • John will choose to buy health insurance

  • Maria will choose to not buy health insurance

Example: A Hypothetical Comparison

John Maria
Y0J=3 Y0M=5
Y1J=4 Y1M=5
Y1JY0J=1 Y1MY0M=0
  • John will choose to buy health insurance

  • Maria will choose to not buy health insurance

  • Health insurance improves John's score by 1, has no effect on Maria's score

Example: A Hypothetical Comparison

John Maria
Y0J=3 Y0M=5
Y1J=4 Y1M=5
Y1JY0J=1 Y1MY0M=0
YJ=(Y1J)=4 YM=(Y0M)=5
  • John will choose to buy health insurance

  • Maria will choose to not buy health insurance

  • Health insurance improves John's score by 1, has no effect on Maria's score

  • Note, all we can observe in the data are their health outcomes after they have chosen (not) to buy health insurance: YJ=4 and YM=5

Example: A Hypothetical Comparison

John Maria
Y0J=3 Y0M=5
Y1J=4 Y1M=5
Y1JY0J=1 Y1MY0M=0
YJ=(Y1J)=4 YM=(Y0M)=5
  • John will choose to buy health insurance

  • Maria will choose to not buy health insurance

  • Health insurance improves John's score by 1, has no effect on Maria's score

  • Note, all we can observe in the data are their health outcomes after they have chosen (not) to buy health insurance: YJ=4 and YM=5

  • Observed difference between John and Maria: YJYM=1

Counterfactuals

John Maria
YJ=4 YM=5

This is all the data we actually observe

  • Observed difference between John and Maria: YJYM=Y1JY0M=1

  • Recall:

    • John has bought health insurance (Y1J)
    • Maria has not bought insurance (Y0M)
  • We don't see the counterfactuals:

    • John's score without insurance
    • Maria score with insurance

Counterfactuals

John Maria
YJ=4 YM=5

This is all the data we actually observe

  • Observed difference between John and Maria: YJYM=Y1JY0M=1

  • Algebra trick: add and subtract Y0J to equation

YjYM=Y1JY0J=1+Y0JY0M=2

  • Y1JY0J=1: Causal effect for John of buying insurance
  • Y0JY0M=2: Difference between John & Maria pre-treatment, "selection bias"

Example II

Y0JY0M0

  • Selection bias: (pre-existing) differences between members of treatment and control groups other than treatment, that affect the outcome
    • i.e. John and Maria start out with very different health scores before either decides to buy insurance or not ("recieve treatment" or not)

John (Treated)

Maria (Control)

Example: Thinking about the Data

Ideal (but impossible) Data

Individual Insured Not Insured Diff
John 4.0 3.0 1.0
Maria 5.0 5.0 0.0
Average 4.5 4.0 0.5
  • Individual treatment effect (for individual i): TEi=Y1iY0i
  • Average treatment effect: ATE=1nni=1Y1iY0i

Example: Thinking about the Data

Ideal (but impossible) Data

Individual Insured Not Insured Diff
John 4.0 3.0 1.0
Maria 5.0 5.0 0.0
Average 4.5 4.0 0.5
  • Individual treatment effect (for individual i): TEi=Y1iY0i
  • Average treatment effect: ATE=1nni=1Y1iY0i

Actual (observed) Data

Individual Insured Not Insured Diff
John 4.0 ? ?
Maria ? 5.0 ?
Average ? ? ?
  • We never get to see each person's counterfactual state to compare and calculate ITEs or ATE

Example: Thinking about the Data

  • Basic comparisons tell us something about outcomes, but not ATE (the causal effect we're looking for)

Difference in Group Outcomes=ATE+Selection Bias

  • Selection bias: difference in average Y0i between groups pre-treatment

  • Y0i includes everything about person i relevant to health except treatment (insurance) status

    • Age, sex, height, weight, climate, smoker, exercise, diet, etc.
    • Imagine a world where nobody has insurance (treatment), who would have highest health scores?

Actual (observed) Data

Individual Insured Not Insured Diff
John 4.0 ? ?
Maria ? 5.0 ?
Average ? ? ?

Random Assignment: The Silver Bullet

  • If treatment is randomly assigned for a large sample, it eliminates selection bias!

  • Treatment and control groups differ on average by nothing except treatment status

  • Creates ceterus paribus conditions in economics: groups are identical on average (holding constant age, sex, height, etc.)

Treatment Group

Control Group

Quasi-Experiments

The Quest for Causal Effects I

  • RCTs are considered the "gold standard" for causal claims

  • But society is not our laboratory (probably a good thing!)

  • We can rarely conduct experiments to get data

The Quest for Causal Effects II

  • Instead, we often rely on observational data

  • This data is not random!

  • Must take extra care in forming an identification strategy

  • To make good claims about causation in society, we must get clever!

Natural Experiments

  • Economists often resort to searching for natural experiments

  • Some events beyond our control occur that separate otherwise similar entities into a "treatment" group and a "control" group that we can compare

  • e.g. natural disasters, U.S. State laws, military draft

The First Natural Experiment

Jon Snow

The First Natural Experiment

1813-1858

  • John Snow utilized the first famous natural experiment to establish the foundations of epidemiology and the germ theory of disease

  • Water pumps with sources downstream of a sewage dump in the Thames river spread cholera while water pumps with sources upstream did not

Famous Natural Experiments

  • Oregon Health Insurance Experiment: Oregon used lottery to grant Medicare access to 10,000 people, showing access to Medicaid increased use of health services, lowered debt, etc. relative to those not on Medicaid
  • Angrist (1990) finds that lifetime earnings of (random) drafted Vietnam veterans is 15% lower than non-veterans
  • Card & Kreuger (1994) find that minimum wage hike in fast-food restaurants on NJ side of border had no disemployment effects relative to restaurants on PA side of border during the same period
  • Acemoglu, Johnson, and Robinson (2001) find that inclusive institutions lead to higher economic development than extractive institutions, determined by a colony's disease environment in 1500
  • We will look at some of these in greater detail throughout the course
  • A great list, with explanations is here

For the Next Few Classes

The 2 Big Problems with Data

Paused

Help

Keyboard shortcuts

, , Pg Up, k Go to previous slide
, , Pg Dn, Space, j Go to next slide
Home Go to first slide
End Go to last slide
Number + Return Go to specific slide
b / m / f Toggle blackout / mirrored / fullscreen mode
c Clone slideshow
p Toggle presenter mode
t Restart the presentation timer
?, h Toggle this help
Esc Back to slideshow