2.6: Inference for Regression

ECON 480 · Econometrics · Fall 2019

Ryan Safner
Assistant Professor of Economics
safner@hood.edu
ryansafner/metricsf19
metricsF19.classes.ryansafner.com

The Sampling Distribution of

; the center of the distribution (2 classes ago)
- ¹
; how precise is our estimate? (last class)
- Variance or standard error

¹ Under the 4 assumptions about (particularly, .

Recall: The Two Big Problems with Data

We use econometrics to identify causal relationships and make inferences about them

Problem for identification: endogeneity
- is exogenous if its variation is unrelated to other factors that affect
- is endogenous if its variation is related to other factors that affect
Problem for inference: randomness
- Data is random due to natural sampling variation
- Taking one sample of a population will yield slightly different information than another sample of the same population

Recall: Distributions of the OLS Estimators

OLS estimators and are computed from a finite (specific) sample of data
Our OLS model contains 2 sources of randomness:

Recall: Distributions of the OLS Estimators

OLS estimators and are computed from a finite (specific) sample of data
Our OLS model contains 2 sources of randomness:
Modeled randomness: includes all factors affecting other than
- different samples will have different values of those other factors

Recall: Distributions of the OLS Estimators

OLS estimators and are computed from a finite (specific) sample of data
Our OLS model contains 2 sources of randomness:
Modeled randomness: includes all factors affecting other than
- different samples will have different values of those other factors

Sampling randomness: different samples will generate different OLS estimators
- Thus, are also random variables, with their own sampling distribution

Recall: Inferential Statistics and Sampling Distributions

Inferential statistics analyzes a sample to make inferences about a much larger (unobservable) population
Population: all possible individuals that match some well-defined criterion of interest (people, firms, cities, etc)
- Characteristics about (relationships between variables in) populations are called parameters
Sample: some portion of the population of interest to represent the whole
- Samples generate statistics used to estimate population parameters

Recall: Inference in Econometrics: The Big Picture

We want to identify causal relationships between population variables
- Logically first thing to consider
- Endogeneity problem
We'll use sample statistics to infer something about population parameters
- In practice, we'll only ever have a finite sample distribution of data
- We don't know the population distribution of data
- Randomness problem

Two Methods of Statistical InferenceEstimation: use our sample data to construct a point estimate of a population parameter and subject it to a hypothesis test 
   

Two Methods of Statistical Inference

Estimation: use our sample data to construct a point estimate of a population parameter and subject it to a hypothesis test
Confidence interval: use our sample data to construct a range for the population parameter

Two Methods of Statistical Inference

Estimation: use our sample data to construct a point estimate of a population parameter and subject it to a hypothesis test
Confidence interval: use our sample data to construct a range for the population parameter

First method is more common, but second is still widely acknowledged
Both will give you similar results
- Tradeoff of accuracy vs. precision
Note statistical inference is different than causal inference!

Hypothesis Testing

Estimation and Hypothesis Testing I

We have already used statistics to estimate a relationship between and
- OLS estimators and of the true population and
We want to test if these estimates are statistically significant and they describe the population
- This is the "bread and butter" of inferential statistics and the purpose of regression

Estimation and Hypothesis Testing I

We have already used statistics to estimate a relationship between and
- OLS estimators and of the true population and
We want to test if these estimates are statistically significant and they describe the population
- This is the "bread and butter" of inferential statistics and the purpose of regression

Examples:

Does reducing class size actually improve test scores?
Do more years of education increase your wages?
Is the gender wage gap between men and women really $0.77?

Estimation and Hypothesis Testing I

We have already used statistics to estimate a relationship between and
- OLS estimators and of the true population and
We want to test if these estimates are statistically significant and they describe the population
- This is the "bread and butter" of inferential statistics and the purpose of regression

Examples:

Does reducing class size actually improve test scores?
Do more years of education increase your wages?
Is the gender wage gap between men and women really $0.77?

All modern science is built upon statistical hypothesis testing, so understand it well!

Estimation and Hypothesis Testing II

Note, we can test a lot of hypotheses about a lot of population parameters, e.g.
- A population mean μ
  - Example: average height of adults
- A population proportion p
  - Example: percent of voters who voted for Trump
- A difference in population means μA−μB
  - Example: difference in average wages of men vs. women
- A difference in population proportions pA−pB
  - Example: difference in percent of patients reporting symptoms of drug A vs B
- See all the possibilities in glorious detail in today's Class Notes
We will focus only on hypotheses about the population regression slope , i.e. the causal effect¹ of on

¹ With a model this simple, it's almost certainly not causal, but this is the ultimate direction we are heading...

Null and Alternative Hypotheses IAll scientific inquiries begin with a null hypothesis (H0) that proposes a specific value of a population parameterNotation: add a subscript 0: β1,0 (or μ0, p0, etc)

   

Null and Alternative Hypotheses IAll scientific inquiries begin with a null hypothesis (H0) that proposes a specific value of a population parameterNotation: add a subscript 0: β1,0 (or μ0, p0, etc)

We suggest an alternative hypothesis (Ha), often the one we hope to verifyNote, can be multiple alternative hypotheses: H1,H2,…,Hn

Null and Alternative Hypotheses IAll scientific inquiries begin with a null hypothesis (H0) that proposes a specific value of a population parameterNotation: add a subscript 0: β1,0 (or μ0, p0, etc)

We suggest an alternative hypothesis (Ha), often the one we hope to verifyNote, can be multiple alternative hypotheses: H1,H2,…,Hn

Ask: "Does our data (sample) give us sufficient evidence to reject H0 in favor of Ha?"Note: the test is always about H0! 
See if we have sufficient evidence to reject the status quo

Null and Alternative Hypotheses IINull hypothesis assigns a value (or a range) to a population parametere.g. β1=2 or β1≤20
Most common null hypothesis is β1=0 ⟹ X has no effect on Y (no slope for a line)
Note: always an equality!

   

Null and Alternative Hypotheses IINull hypothesis assigns a value (or a range) to a population parametere.g. β1=2 or β1≤20
Most common null hypothesis is β1=0 ⟹ X has no effect on Y (no slope for a line)
Note: always an equality!

Alternative hypothesis must mathematically contradict the null hypothesise.g. β1≠2 or β1>20 or β1≠0
Note: always an inequality!

   

Null and Alternative Hypotheses IINull hypothesis assigns a value (or a range) to a population parametere.g. β1=2 or β1≤20
Most common null hypothesis is β1=0 ⟹ X has no effect on Y (no slope for a line)
Note: always an equality!

Alternative hypothesis must mathematically contradict the null hypothesise.g. β1≠2 or β1>20 or β1≠0
Note: always an inequality!

Alternative hypotheses come in two forms:One-sided alternative: β1>H0 or β1<H0
Two-sided alternative: β1≠H0Note this means either β1<H0 or β1>H0

Components of a Valid Hypothesis Test

All statistical hypothesis tests have the following components:

A null hypothesis,
An alternative hypothesis,
A test statistic to determine if we reject when the statistic reaches a "critical value"
- Beyond the critical value is the "rejection region", sufficient evidence to reject
A conclusion whether or not to reject in favor of

Type I and Type II Errors I

Any sample statistic (e.g. ) will rarely be exactly equal to the hypothesized population parameter (e.g. )
Difference between observed statistic and true paremeter could be because:

Parameter is not the hypothesized value is false)
Parameter is truly the hypothesized value is true) but sampling variability gave us a different estimate

We cannot distinguish between these two possibilities with any certainty

Type I and Type II Errors II

We can interpret our estimates probabilistically as commiting one of two types of error:

Type I error (false positive): rejecting when it is in fact true
- Believing we found an important result when there is truly no relationship
Type II error (false negative): failing to reject when it is in fact false
- Believing we found nothing when there was truly a relationship to find

Type I and Type II Errors III
 
Truth

    Null is True 
    Null is False 
  
    Judgment 
    Reject Null 
    TYPE I ERROR 
    CORRECT 
  
    (False +) 
    (True +) 
  
    Don't Reject Null 
    CORRECT 
    TYPE II ERROR 
  
    (True -) 
    (False -) 
  
Depending on context, committing one type of error may be more serious than the other

	Truth
Judgment	Reject Null	TYPE I ERROR	CORRECT
(False +)	(True +)
Don't Reject Null	CORRECT	TYPE II ERROR
(True -)	(False -)

Type I and Type II Errors IV
 
Truth

    Defendant is Innocent 
    Defendant is Guilty 
  
    Judgment 
    Convict 
    TYPE I ERROR 
    CORRECT 
  
    (False +) 
    (True +) 
  
    Acquit 
    CORRECT 
    TYPE II ERROR 
  
    (True -) 
    (False -) 
  
Anglo-American common law presumes defendant is innocent: H0

	Truth
Judgment	Convict	TYPE I ERROR	CORRECT
(False +)	(True +)
Acquit	CORRECT	TYPE II ERROR
(True -)	(False -)

Type I and Type II Errors IV

		Truth
		Defendant is Innocent	Defendant is Guilty
Judgment	Convict	TYPE I ERROR	CORRECT
	Convict	(False +)	(True +)
	Acquit	CORRECT	TYPE II ERROR
	Acquit	(True -)	(False -)

Anglo-American common law presumes defendant is innocent:
Jury judges whether the evidence presented against the defendant is plausible assuming the defendant were in fact innocent

Type I and Type II Errors IV

		Truth
		Defendant is Innocent	Defendant is Guilty
Judgment	Convict	TYPE I ERROR	CORRECT
	Convict	(False +)	(True +)
	Acquit	CORRECT	TYPE II ERROR
	Acquit	(True -)	(False -)

Anglo-American common law presumes defendant is innocent:
Jury judges whether the evidence presented against the defendant is plausible assuming the defendant were in fact innocent
If highly improbable: sufficient evidence to reject and convict
- Beyond a "reasonable doubt" that the defendant is innocent

Type I and Type II Errors V

William Blackstone

(1723-1780)

"It is better that ten guilty persons escape than that one innocent suffer."

Type I error is worse than a Type II error in law!

Blackstone, William, 1765-1770, Commentaries on the Laws of England

Type I and Type II Errors VI

Significance Level, , and Confidence Level

The significance level, , is the probability of a Type I error

Significance Level, , and Confidence Level

The significance level, , is the probability of a Type I error

The confidence level is defined as
- Specify in advance an -level (0.10, 0.05, 0.01) with associated confidence level (90%, 95%, 99%)

Significance Level, , and Confidence Level

The significance level, , is the probability of a Type I error

The confidence level is defined as
- Specify in advance an -level (0.10, 0.05, 0.01) with associated confidence level (90%, 95%, 99%)
The probability of a Type II error is defined as :

α and β
 
Truth

    Null is True 
    Null is False 
  
    Judgment 
    Reject Null 
    TYPE I ERROR 
    CORRECT 
  
    (alpha) 
    (1-beta) 
  
    Don't Reject Null 
    CORRECT 
    TYPE II ERROR 
  
    (1-alpha) 
    (beta)

	Truth
Judgment	Reject Null	TYPE I ERROR	CORRECT
(alpha)	(1-beta)
Don't Reject Null	CORRECT	TYPE II ERROR
(1-alpha)	(beta)

Power and p-values

The statistical power of the test is : the probability of correctly rejecting when is in fact false (e.g. not convicting an innocent person)

Power and p-values

The statistical power of the test is : the probability of correctly rejecting when is in fact false (e.g. not convicting an innocent person)

The -value or significance probability is the probability that, given the null hypothesis is true, the test statistic from a random sample will be at least as extreme as the test statistic of our sample

where represents some test statistic
is the test statistic we observe in our sample
More on this in a bit

p-Values and Statistical Significance

After running our test, we need to make a decision between the competing hypotheses
Compare -value with pre-determined (commonly, , 95% confidence level)
- If : statistically significant evidence sufficient to reject in favor of
- If p≥α: insufficient evidence to reject H0
  - Note this does not mean is true! We merely have failed to reject

Digression: p-Values and the Philosophy of Science

Hypothesis Testing and the Philosophy of Science I

Sir Ronald A. Fisher

(1890—1962)

"The null hypothesis is never proved or established, but is possibly disproved, in the course of experimentation. Every experiment may be said to exist only in order to give the facts a chance of disproving the null hypothesis."

1931, The Design of Experiments

Hypothesis Testing and the Philosophy of Science I

Modern philosophy of science is largely based off of hypothesis testing and falsifiability, which form the "Scientific Method"¹
For something to be "scientific", it must be falsifiable, or at least testable
Hypotheses can be corroborated with evidence, but always tentative until falsified by data in suggesting an alternative hypothesis

"All swans are white" is a hypothesis rejected upon discovery of a single black swan

¹ Note: economics is a very different kind of "science" with a different methodology!

Hypothesis Testing and p-Values

Hypothesis testing, confidence intervals, and p-values are probably the hardest thing to understand in statistics

Fivethirtyeight: Not Even Scientists Can Easily Explain P-values

Hypothesis Testing: Which Test? I

Rigorous course on statistics (ECMG 212 or MATH 112) will spend weeks going through different types of tests:
- Sample mean; difference of means
- Proportion; difference of proportions
- Z-test vs t-test
- 1 sample vs. 2 samples
- test
See today's class notes page for more

Hypothesis Testing: Which Test? II

There is Only One Test

Fortunately, some clever statisticians realized "there is only one test" and built a nice R package called infer

Calculate a statistic, ¹, from a sample of data
Simulate a world where is null
Examine the distribution of across the null world
Calculate the probability that could exist in the null world
Decide if is statistically significant

¹ can stand in for any test-statistic in any hypothesis test! For our purposes, is the slope of our regression sample, .

Elements of a Hypothesis Test

Alan Downey: "There is still only one test"

Hypothesis Testing with the infer Package I

R naturally runs the following hypothesis test on any regression as part of lm():

infer allows you to run through these steps manually to understand the process:

Hypothesis Testing with the infer Package I

R naturally runs the following hypothesis test on any regression as part of lm():

infer allows you to run through these steps manually to understand the process:

specify() a model

Hypothesis Testing with the infer Package I

R naturally runs the following hypothesis test on any regression as part of lm():

infer allows you to run through these steps manually to understand the process:

specify() a model
hypothesize() the null

Hypothesis Testing with the infer Package I

R naturally runs the following hypothesis test on any regression as part of lm():

infer allows you to run through these steps manually to understand the process:

specify() a model
hypothesize() the null
generate() simulations of the null world

Hypothesis Testing with the infer Package I

R naturally runs the following hypothesis test on any regression as part of lm():

infer allows you to run through these steps manually to understand the process:

specify() a model
hypothesize() the null
generate() simulations of the null world
calculate() the -value

Hypothesis Testing with the infer Package I

R naturally runs the following hypothesis test on any regression as part of lm():

infer allows you to run through these steps manually to understand the process:

specify() a model
hypothesize() the null
generate() simulations of the null world
calculate() the -value
visualize() with a histogram (optional)

Hypothesis Testing with the infer Package II

Classical Statistical Inference: Critical Values of Test Statistic

Test statistic : measures how far what we observed in our sample is from what we would expect if the null hypothesis were true
- Calculated from a sampling distribution of the estimator (i.e.
- In econometrics, we use -distributions which have degrees of freedom^.red[1]
Rejection region: if the test statistic reaches a "critical value" of , then we reject the null hypothesis

¹ Again, see today's class notes for more on the t-distribution. is the number of independent variables our model has, in this case, with just one , . We use two degrees of freedom to calculate and , hence we have df.

Simulating the Sampling Distribution with infer

Imagine a Null World, where is True

Our world, and a world where by assumption.

Comparing the Worlds IFrom that null world where H0:β1=0 is true, we simulate another sample and calculate OLS estimators again
   

Comparing the Worlds IFrom that null world where H0:β1=0 is true, we simulate another sample and calculate OLS estimators again
Our SampleABCDEFGHIJ0123456789
term
<chr>
estimate
<dbl>
std.error
<dbl>
(Intercept)698.9329529.4674914
str-2.2798080.4798256
2 rows | 1-3 of 5 columns
   

term <chr>	estimate <dbl>	std.error <dbl>
(Intercept)	698.932952	9.4674914
str	-2.279808	0.4798256

Comparing the Worlds IFrom that null world where H0:β1=0 is true, we simulate another sample and calculate OLS estimators again
Our SampleABCDEFGHIJ0123456789
term
<chr>
estimate
<dbl>
std.error
<dbl>
(Intercept)698.9329529.4674914
str-2.2798080.4798256
2 rows | 1-3 of 5 columns
Another SampleABCDEFGHIJ0123456789
term
<chr>
estimate
<dbl>
std.error
<dbl>
(Intercept)647.80279529.7147718
str0.32350380.4923581
2 rows | 1-3 of 5 columns
   

term <chr>	estimate <dbl>	std.error <dbl>
(Intercept)	698.932952	9.4674914
str	-2.279808	0.4798256

term <chr>	estimate <dbl>	std.error <dbl>
(Intercept)	647.8027952	9.7147718
str	0.3235038	0.4923581

Comparing the Worlds II

From that null world where is true, let's simulate 1,000 samples and calculate slope for each

ABCDEFGHIJ0123456789

sample <int>	slope <dbl>
1	-0.3027333296
2	-0.3624481355
3	0.6448518690
4	-0.0745971847
5	0.5969444290
6	0.5505335318
7	0.5927466147
8	0.0572148658
9	-0.0989989073
10	0.8043957511

Prepping the infer Pipeline

Before I show you how to do this, let's first save our estimated slope from our actual sample
- We'll want this later!

# save as obs_slope
sample_slope <- school_reg_tidy %>% # this is the regression tidied with broom
  filter(term=="str") %>%
  pull(estimate)
# confirm what it is
sample_slope

## [1] -2.279808

The infer Pipeline: Specify

The infer  Pipeline: SpecifySpecify
data %>%
  specify(y ~ x)

Take our data and pipe it into the specify() function, which is essentially a lm() function for regression (for our purposes)
CASchool %>%
  specify(testscr ~ str)
ABCDEFGHIJ0123456789
testscr
<dbl>
str
<dbl>
690.817.88991
661.221.52466
643.618.69723
3 rows
Note nothing happens yet
   

The infer Pipeline: Hypothesize

Specify

Hypothesize

%>% hypothesize(null = "independence")

Describe what the null hypothesis is here
In infer's language, we are hypothesizing that str and testscr are independent ¹

CASchool %>%
  specify(testscr ~ str) %>%
  hypothesize(null = "independence")

ABCDEFGHIJ0123456789

testscr <dbl>	str <dbl>
690.8	17.88991
661.2	21.52466
643.6	18.69723

¹ type can be either point (for specific point estimates for a single variable, such as a sample mean, , or independence (for hypotheses about two samples or a relationship between variables). See more here.

The infer Pipeline: Generate I

Specify

Hypothesize

Generate

%>% generate(reps = n,
             type = "permute")

Now the magic starts, as we run a number of simulated samples¹
Set the number of reps and set the type equal to "permute"

i %>%
  generate(reps = 1000,
           type = "permute")

¹ Note for spacing on the slide, I saved the previous code as i and pipe it into the remainder.

The infer Pipeline: Generate II

Specify

Hypothesize

Generate

%>% generate(reps = n,
             type = "permute")

ABCDEFGHIJ0123456789

testscr <dbl>	str <dbl>	replicate <int>
693.95	17.88991	1
642.40	21.52466	1
680.45	18.69723	1
672.70	17.35714	1
666.45	18.67133	1
654.20	21.40625	1
671.95	19.50000	1
671.75	20.89412	1
624.55	19.94737	1
699.10	20.80556	1

The infer Pipeline: Generate III

Specify

Hypothesize

Generate

%>% generate(reps = n,
             type = "permute")

There are two types of simulations we can run:¹
"bootstrap" takes a random draw of our existing sample's observations (of the same number of observations) with replacement
- this approximates a sampling distribution
"permute" is a bootstrap without replacement

¹ You can do either of these in base R with sample(), which has 3 arguments: a vector to sample from, size (number of obs), and replace equal to TRUE or FALSE. See more for infer here.

The infer Pipeline: Calculate I

Specify

Hypothesize

Generate

Calculate

%>% calculate(stat = "")

We calculate sample statistics for each of the 1,000 replicate samples
In our case, calculate the slope, for each replicate

i %>%
  generate(reps = 1000, type = "permute") %>%
  calculate(stat = "slope")

Other stats for calculation: "mean", "median", "prop", "diff in means", "diff in props", etc. (see package information)

The infer Pipeline: Calculate II

Specify

Hypothesize

Generate

Calculate

%>% calculate(stat = "")

ABCDEFGHIJ0123456789

replicate <int>	stat <dbl>
1	0.384783281
2	0.241700895
3	0.268799843
4	-0.189039951
5	1.215030315
6	0.511783627
7	-0.457378304
8	1.008206723
9	0.092043084
10	0.233837801

The infer Pipeline: Get p Value

Specify

Hypothesize

Generate

Calculate

Get p Value

%>% get_p_value(obs stat = "",
                direction = "both")

We can calculate the -value
- the probability of seeing a value at least as large as our sample_slope (-2.28) in our simulated null distribution
Two-sided alternative , we double the raw -value

simulations %>%
  get_p_value(obs_stat = sample_slope,
              direction = "both")

ABCDEFGHIJ0123456789

p_value <dbl>
0

⁺ Note here I saved the results of our previous code as simulations for spacing.

The infer  Pipeline: Get Confidence IntervalSpecify
Hypothesize
Generate
Calculate
Get confidence interval
%>% get_ci(
  level = 0.95,
  type = "se",
  point_estimate = "")

We can calculate the confidence interval for β1 from our sample_slope ^β1 of -2.28CI0.95=^β1±2×se(^β1)
Specify confidence level (1−α), often 0.95
Supply our original estimate for point_estimate

simulations %>%
  get_confidence_interval(level = 0.95,
                          type = "se",
                          point_estimate = sample_slope)
ABCDEFGHIJ0123456789
lower
<dbl>
upper
<dbl>
-3.234823-1.324793
1 row
   

lower <dbl>	upper <dbl>
-3.234823	-1.324793

The infer Pipeline: Visualize I

Specify

Hypothesize

Generate

Calculate

Visualize

%>% visualize()

Make a histogram of our null distribution of β1
- Note it is centered at because that's !

simulations %>%
 visualize()

The infer Pipeline: Visualize II

Specify

Hypothesize

Generate

Calculate

Visualize

%>% visualize()

Add our sample_slope to show our finding on the null distr.

simulations %>%
 visualize(obs_stat = sample_slope)

The infer Pipeline: Visualize p-value

Specify

Hypothesize

Generate

Calculate

Visualize

%>% visualize()+
  shade_p_value()

Add shade_p_value to see what is

simulations %>%
 visualize(obs_stat = sample_slope)+
  shade_p_value(obs_stat = sample_slope,
                direction = "two_sided")

The infer Pipeline: Visualize Confidence Intervals

Specify

Hypothesize

Generate

Calculate

Visualize

%>% visualize()+
  shade_ci()

To shade confidence interval, we first need a vector of what they are
- I've saved the outputted tibble of them from 4 slides ago as ci_values

simulations %>%
 visualize(obs_stat = sample_slope)+
  shade_confidence_interval(ci_values)

The infer  Pipeline: Visualize is a Wrapper of ggplotinfer's visualize() function is just a wrapper function for ggplot()you can take your simulations tibble and just ggplot a normal histogram

   

The infer  Pipeline: Visualize is a Wrapper of ggplotinfer's visualize() function is just a wrapper function for ggplot()you can take your simulations tibble and just ggplot a normal histogram

simulations %>%
  ggplot(data = .)+
  aes(x = stat)+
  geom_histogram(color="white", fill="indianred")+
  geom_vline(xintercept = sample_slope,
             color = "blue",
             size = 2,
             linetype = "dashed")+
  labs(x = expression(paste("Distribution of ", hat(beta[1]), " under ", H[0], " that ", beta[1]==0)),
       y = "Samples")+
    theme_classic(base_family = "Fira Sans Condensed",
           base_size=20)

   

The infer Pipeline: Visualize is a Wrapper of ggplot

infer's visualize() function is just a wrapper function for ggplot()
- you can take your simulations tibble and just ggplot a normal histogram

simulations %>%
  ggplot(data = .)+
  aes(x = stat)+
  geom_histogram(color="white", fill="indianred")+
  geom_vline(xintercept = sample_slope,
             color = "blue",
             size = 2,
             linetype = "dashed")+
  labs(x = expression(paste("Distribution of ", hat(beta[1]), " under ", H[0], " that ", beta[1]==0)),
       y = "Samples")+
    theme_classic(base_family = "Fira Sans Condensed",
           base_size=20)

What R Does: Classical Statistical Inference I

R does things the old-fashioned way, using a theoretical null distribution instead of simulation
A -distribution with df¹
Calculate a -statistic for :

¹ is the number of variables.

What R Does: Classical Statistical Inference II

has the same interpretation as , number of std. dev. away from the distribution's center¹
Compares to a critical value of (determined by & )
- For 95% confidence, , ²

¹ Think of our simulated distribution, the center was 0.

² The 68-95-99.7% empirical rule!

What R Does: Classical Statistical Inference III

Our sample slope is 4.75 standard deviations below the mean under
-value: prob. of a test statistic at least as large (in magnitude) as ours if the null hypothesis were true¹
- -value is 2-sided for

¹ Think of our simulated distribution, the center was 0.

1-Sided vs. 2-Sided p-values I

-value:

1-Sided vs. 2-Sided p-values I

-value:

Hypothesis Tests in Regression Output I

summary(school_reg)

## 
## Call:
## lm(formula = testscr ~ str, data = CASchool)
## 
## Residuals:
##     Min      1Q  Median      3Q     Max 
## -47.727 -14.251   0.483  12.822  48.540 
## 
## Coefficients:
##             Estimate Std. Error t value Pr(>|t|)    
## (Intercept) 698.9330     9.4675  73.825  < 2e-16 ***
## str          -2.2798     0.4798  -4.751 2.78e-06 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 18.58 on 418 degrees of freedom
## Multiple R-squared:  0.05124,    Adjusted R-squared:  0.04897 
## F-statistic: 22.58 on 1 and 418 DF,  p-value: 2.783e-06

Hypothesis Tests in Regression Output II

In broom's tidy() (with confidence intervals)

tidy(school_reg, conf.int=TRUE)

ABCDEFGHIJ0123456789

term <chr>	estimate <dbl>	std.error <dbl>	statistic <dbl>	p.value <dbl>	conf.low <dbl>	conf.high <dbl>
(Intercept)	698.932952	9.4674914	73.824514	6.569925e-242	680.32313	717.542779
str	-2.279808	0.4798256	-4.751327	2.783307e-06	-3.22298	-1.336637

Conclusions

Because the hypothesis test's -value (0.05)...
We have sufficient evidence to reject in favor of our alternative hypothesis. Our sample suggests that there is a relationship between class size and test scores.

Conclusions

Because the hypothesis test's -value (0.05)...
We have sufficient evidence to reject in favor of our alternative hypothesis. Our sample suggests that there is a relationship between class size and test scores.
Using the confidence intervals:
We are 95% confident that the true marginal effect of class size on test scores is between and .

Hypothesis Testing and Confidence Intervals: Relationship

Confidence intervals are all two-sided by nature
Hypothesis test -test) of computes a -value of¹
and <0.05 when

Hypothesis Testing and Confidence Intervals: Relationship

Confidence intervals are all two-sided by nature
Hypothesis test -test) of computes a -value of¹
and <0.05 when
If a confidence interval contains the value (i.e. , for our test), then we fail to reject .

¹ Since our null hypothesis is that (\beta_{1,0}=0), the test statistic simplifies to this neat fraction.

Common Misconceptions about p-valuesAll of the following are FALSE interpretations of p (and below are reasons why each is wrong)
p is the probability that the alternative hypothesis is falseWe can never prove an alternative hypothesis, only tentatively reject a null hypothesis

Common Misconceptions about p-values

All of the following are FALSE interpretations of (and below are reasons why each is wrong)

is the probability that the alternative hypothesis is false
- We can never prove an alternative hypothesis, only tentatively reject a null hypothesis
is the probability that the null hypothesis is true
- We're not proving the is false, only saying that it's very unlikely that if were true, we'd obtain a slope as rare as our sample's slope

Common Misconceptions about p-values

All of the following are FALSE interpretations of (and below are reasons why each is wrong)

is the probability that the alternative hypothesis is false
- We can never prove an alternative hypothesis, only tentatively reject a null hypothesis
is the probability that the null hypothesis is true
- We're not proving the is false, only saying that it's very unlikely that if were true, we'd obtain a slope as rare as our sample's slope
is the probability that our observed effects were produced purely by random chance
- is computed under a specific model (think about our null world) that assumes is true

Common Misconceptions about p-values

All of the following are FALSE interpretations of (and below are reasons why each is wrong)

is the probability that the alternative hypothesis is false
- We can never prove an alternative hypothesis, only tentatively reject a null hypothesis
is the probability that the null hypothesis is true
- We're not proving the is false, only saying that it's very unlikely that if were true, we'd obtain a slope as rare as our sample's slope
is the probability that our observed effects were produced purely by random chance
- is computed under a specific model (think about our null world) that assumes is true
tells us how significant our finding is
- tells us nothing about the size or the real world significance of any effect deemed "statistically significant"
- it only tells us that the slope is statistically significantly different from 0 (if is

Abusing p-Values I

Source: SMBC

Abusing p-Values II

"The widespread use of 'statistical significance' (generally interpreted as as a license for making a claim of a scientific finding (or implied truth) leads to considerable distortion of the scientific process."

Wasserstein, Ronald L. and Nicole A. Lazar, (2016), "The ASA's Statement on p-Values: Context, Process, and Purpose," The American Statistician 30(2): 129-133

p-value Clarification

Again, p-value is the probability that, assuming the null hypothesis is true, we obtain (by pure random chance) a test statistic at least as extreme as the one we estimated for our sample
A low p-value means either (and we can't distinguish which):
1. is true and a highly improbable event has occurred OR
2. is false

Significance In Regression Tables

	Test Score
Intercept	698.93 ***
	(9.47)
STR	-2.28 ***
	(0.48)
N	420
R-Squared	0.05
SER	18.58
* p < 0.001; p < 0.01; * p < 0.05.

Statistical significance is shown by asterisks, common (but not always!) standard:
- 1 asterisk: significant at
- 2 asterisks: significant at
- 3 asterisks: significant at
Rare, but sometimes regression tables include -values for estimates

↑, ←, Pg Up, k	Go to previous slide
↓, →, Pg Dn, Space, j	Go to next slide
Home	Go to first slide
End	Go to last slide
Number + Return	Go to specific slide
b / m / f	Toggle blackout / mirrored / fullscreen mode
c	Clone slideshow
p	Toggle presenter mode
t	Restart the presentation timer
?, h	Toggle this help

		Truth
		Null is True	Null is False
Judgment	Reject Null	TYPE I ERROR	CORRECT
	Reject Null	(False +)	(True +)
	Don't Reject Null	CORRECT	TYPE II ERROR
	Don't Reject Null	(True -)	(False -)

2.6: Inference for Regression

ECON 480 · Econometrics · Fall 2019

Ryan Safner Assistant Professor of Economics safner@hood.edu ryansafner/metricsf19 metricsF19.classes.ryansafner.com

The Sampling Distribution of ^β1

Recall: The Two Big Problems with Data

Recall: Distributions of the OLS Estimators

Recall: Distributions of the OLS Estimators

Recall: Distributions of the OLS Estimators

Recall: Inferential Statistics and Sampling Distributions

Recall: Inference in Econometrics: The Big Picture

Two Methods of Statistical Inference

Two Methods of Statistical Inference

Two Methods of Statistical Inference

Hypothesis Testing

Estimation and Hypothesis Testing I

Estimation and Hypothesis Testing I

Estimation and Hypothesis Testing I

Estimation and Hypothesis Testing II

Null and Alternative Hypotheses I

Null and Alternative Hypotheses I

Null and Alternative Hypotheses I

Null and Alternative Hypotheses II

Null and Alternative Hypotheses II

Null and Alternative Hypotheses II

Components of a Valid Hypothesis Test

Type I and Type II Errors I

Type I and Type II Errors II

Type I and Type II Errors III

Type I and Type II Errors IV

Type I and Type II Errors IV

Type I and Type II Errors IV

Type I and Type II Errors V

Type I and Type II Errors VI

Significance Level, α, and Confidence Level 1−α

Significance Level, α, and Confidence Level 1−α

Significance Level, α, and Confidence Level 1−α

α and β

Power and p-values

Power and p-values

p-Values and Statistical Significance

Digression: p-Values and the Philosophy of Science

Hypothesis Testing and the Philosophy of Science I

Hypothesis Testing and the Philosophy of Science I

Hypothesis Testing and p-Values

Hypothesis Testing: Which Test? I

Hypothesis Testing: Which Test? II

There is Only One Test

Elements of a Hypothesis Test

Hypothesis Testing with the infer Package I

Hypothesis Testing with the infer Package I

Hypothesis Testing with the infer Package I

Hypothesis Testing with the infer Package I

Hypothesis Testing with the infer Package I

Hypothesis Testing with the infer Package I

Hypothesis Testing with the infer Package II

Hypothesis Testing with the infer Package II

Hypothesis Testing with the infer Package II

Hypothesis Testing with the infer Package II

Hypothesis Testing with the infer Package II

Hypothesis Testing with the infer Package II

Classical Statistical Inference: Critical Values of Test Statistic

Simulating the Sampling Distribution with infer

Imagine a Null World, where H0 is True

Comparing the Worlds I

Comparing the Worlds I

Our Sample

Comparing the Worlds I

Our Sample

Another Sample

Comparing the Worlds II

Prepping the infer Pipeline

The infer Pipeline: Specify

The infer Pipeline: Specify

Specify

The infer Pipeline: Hypothesize

The infer Pipeline: Hypothesize

Specify

Hypothesize

The infer Pipeline: Generate I

The infer Pipeline: Generate I

Ryan Safner
Assistant Professor of Economics
safner@hood.edu
ryansafner/metricsf19
metricsF19.classes.ryansafner.com

The Sampling Distribution of

Significance Level, , and Confidence Level

Significance Level, , and Confidence Level

Significance Level, , and Confidence Level

and

Imagine a Null World, where is True