# 10: Bootstrapping and Confidence Intervals

Based on Chapter 8 of ModernDive. Code for Quiz 12.

Load the R package we will use.

``````library(tidyverse)
``````
• Look at the variable definitions in `congress_age`

What is the average age of members that have served in congress?

• Set random seed generator to 123

• Take a sample of 100 from the dataset `congress_age` and assign it to `congress_age_100`

``````set.seed(???)

??? <- ???  %>%
rep_sample_n(size=???)``````
• congress_age is the population and ??? is the sample

• ??? is number of observations in the the population and ??? is the number of observations in your sample

Construct the confidence interval

1. Use `specify` to indicate the variable from congress_age_100 that you are interested in

``````congress_age_100  %>%
???(response = ???)``````

2. `generate` 1000 replicates of your sample of 100

``````congress_age_100  %>%
specify(response = age)  %>%
???(reps = 1000, type= "bootstrap")``````

The output has ??? rows

3. `calculate` the mean for each replicate

• Assign to `bootstrap_distribution_mean_age`

• Display `bootstrap_distribution_mean_age`

``````bootstrap_distribution_mean_age  <- congress_age_100  %>%
specify(response = age)  %>%
generate(reps = 1000, type = "bootstrap")  %>%
???(stat = "???")

bootstrap_distribution_mean_age
``````
• The bootstrap_distribution_mean_age has ??? means

4. `visualize` the bootstrap distribution

``???(???) ``

Calculate the 95% confidence interval using the percentile method

• Assign the output to `congress_ci_percentile`

• Display `congress_ci_percentile`

``````congress_ci_percentile  <- bootstrap_distribution_mean_age %>%
get_confidence_???(type = "???", level = ???)

congress_ci_percentile``````
• Calculate the observed point estimate of the mean and assign it to `obs_mean_age`

• Display `obs_mean_age`,

``````obs_mean_age  <-  ???  %>%
specify(response = ???)  %>%
calculate(stat = "???")  %>%
pull()

obs_mean_age``````

• Add a line at the observed mean, `obs_mean_age`, to your visualization and color it “hotpink”

``````visualize(bootstrap_distribution_mean_age) +
geom_vline(xintercept = ???, color = "hotpink", size = 1 )``````
• Calculate the population mean to see if it is in the 95% confidence interval

• Assign the output to `pop_mean_age`

• Display `pop_mean_age`

``````pop_mean_age  <- ???  %>%
summarize(pop_mean= mean(age))  %>% pull()

pop_mean_age``````
• Add a line to the visualization at the, population mean, `pop_mean_age`, to the plot color it “purple”
``````visualize(bootstrap_distribution_mean_age) +