Problem Set 2
Due by Thursday, September 26, 2019
ANSWERS:
Instructions
For this problem set, you may submit handwritten answers on a plain sheet of paper, or download and type/handwrite on the PDF.
Alternatively, you may download the .Rmd
file, do the homework in markdown, and email to me a single knit
ted html
or pdf
file (and be sure that it shows all of your code).
You may work together (and I highly encourage that) but you must turn in your own answers. I grade homeworks 70% for completion, and for the remaining 30%, pick one question to grade for accuracy - so it is best that you try every problem, even if you are unsure how to complete it accurately.
Theory and Concepts
Question 1
Part A
In your own words, explain the difference between endogeneity and exogeneity.Hint: you may want to look back at class 1.2 for this question.
Part B
In your own words, explain how conducting a randomized controlled trial helps to identify the causal connection between two variables.
Question 2
Part A
In your own words, explain what (sample) standard deviation means.
Part B
In your own words, explain how (sample) standard deviation is calculated. You may also write the formula, but it is not necessary.
Problems
For the remaining questions, you may use R
to verify, but please calculate all sample statistics by hand and show all work.
Question 3
Suppose you have a very small class of four students that all take a quiz. Their scores are reported as follows:
{83,92,72,81}
Part A
Calculate the median.
Part B
Calculate the sample mean, ˉx.
Part C
Calculate the sample standard deviation, s.
Part D
Make or sketch a rough histogram of this data, with the size of each bin being 10 (i.e. 70’s, 80’s, 90’s, 100’s). You can draw this by hand or use R
.If you are using ggplot
, you want to use +geom_histogram(breaks=seq(start,end,by))
and add +scale_x_continuous(breaks=seq(start,end,by))
. For each, it creates bins in the histogram, and ticks on the x axis by creating a seq
uence starting at start
(a number), ending at end
(number), by
a certain interval (i.e. by 10
s.).
Is this distribution roughly symmetric or skewed? What would we expect about the mean and the median?
Part E
Suppose instead the person who got the 72 did not show up that day to class, and got a 0 instead. Recalculate the mean and median. What happened and why?
Question 4
Suppose the probabilities of a visitor to Amazon’s website buying 0, 1, or 2 books are 0.2, 0.4, and 0.4 respectively.
Part A
Calculate the expected number of books a visitor will purchase.
Part B
Calculate the standard deviation of book purchases.
Part C
Bonus: try doing this in R
by making an initial dataframe of the data, and then making new columns to the “table” like we did in class.
Question 5
Scores on the SAT (out of 1600) are approximately normally distributed with a mean of 500 and standard deviation of 100.
Part A
What is the probability of getting a score between a 400 and a 600?
Part B
What is the probability of getting a score between a 300 and a 700?
Part C
What is the probability of getting at least a 700?
Part D
What is the probability of getting at most a 700?
Part E
What is the probability of getting exactly a 500?
Question 6
Redo problem 5 by using the pnorm()
command in R
.Hint: This function has four arguments: 1. the value of the random variable, 2. the mean of the distribution, 3. the sd of the distribution, and 4. lower.tail
TRUE
or FALSE
.