2.1: Data 101 and Descriptive Statistics - Class Notes

Overview
Slides
Problem Set
Math Appendix
- The Summation Operator

Tuesday, September 17, 2019

Overview

Today we pick up where we left off from causality. We begin with an review and overview of using data and descriptive statistics. We want to quantify characteristics about samples as statistics, which we will later use to infer things about populations (which we will later identify causal relationships).

Next class will be on random variables and distributions. This full week is your crash course/review of basic statistics that we will need to start the “meat and potatoes” of this class: linear regression next Tuesday. As such, I’ll give you a brief homework to review these statistical concepts (with minimal use of R).

Slides

Lecture Slides

Problem Set

Problem Set 1 is due Thursday September 19.

Problem set 2 (on classes 2.1-2.2) will be posted shortly, and is due by Tuesday September 24.

Math Appendix

The Summation Operator

Definition

Many elementary propositions in econometrics (and statistics) involve the use of the sums of numbers. Mathematicians often use the summation operator (the greek letter –“sigma”) as a shorthand, rather than writing everything out the long way. It will be worth your time to understand the summation operator, and some of its properties, and how these can provide shortcuts to proving more advanced theorems in econometrics.

Let be a random variable from which a sample of observations is observed, so we have a sequence i.e. $x_i, $ for . Then the total sum of the observations can be represented as:

The term beneath is known as the “index,” which tells us where to begin our adding (at the 1^st individual term, )
- Note other letters, such as , or may be used (especially if is defined elsewhere)
The term above is the total number of terms we should add
Essentially, read as “add up all the individual observations from the 1^st to the final .”

Useful Properties of Summation Operators

Rule 1: The summation of a constant times a random variable is equal to the constant times the summation of that random variable:

Proof:

Rule 2: The summation of a sum of two random variables is equal to the sum of their summations:

Proof:

Rule 3: The summation of constant over observations is the product of the constant and :

Proof:

Combining these 3 rules: for the sum of a linear combination of a random variable ():

Proof: left to you as an exercise!

Advanced: Useful Properties for Regression

There are some additional properties of summations that may not be immediately obvious, but will be quite essential in proving properties of linear regressions.

Using the properties above, we can describe the mean, variance, and covariance of random variables.For more beyond the mere definition, see the appendix on Covariance and Correlation

First, define the mean of a sequence and as:

Second, the variance of is:

Third, the covariance of and is:

Rule 4: The sum of the deviations of observations of from its mean () is 0:

Proof:

Rule 5: The squared deviations of are equal to the product of times its deviations:

Proof:

Rule 6: The following summations involving and are equivalent:

Proof:

equivalently:

2.1: Data 101 and Descriptive Statistics - Class Notes

Contents

Overview

Slides

Problem Set

Math Appendix

The Summation Operator

Definition

Useful Properties of Summation Operators

Advanced: Useful Properties for Regression