Bootstrapping and Confidence intervals

Based on Chapter 8 of ModernDive. Code for Quiz 12.

Load the R package we will use.

What is the average age of members that have served in congress?

set.seed(4346)

congress_age_100 <- congress_age %>%
  rep_sample_n(size=100)

Construct the confidence interval

1. Use specify to indicate the variables from congress_age_100 that you are interested in.

congress_age_100 %>%
  specify(response = age)
Response: age (numeric)
# A tibble: 100 x 1
     age
   <dbl>
 1  58  
 2  27.3
 3  59.4
 4  47.8
 5  36.4
 6  62.3
 7  52.5
 8  55.5
 9  44  
10  48  
# ... with 90 more rows

2. Generate 1000 replicates of your sample of 100

congress_age_100 %>%
  specify(response = age) %>%
  generate(reps = 1000, type ="bootstrap")
Response: age (numeric)
# A tibble: 100,000 x 2
# Groups:   replicate [1,000]
   replicate   age
       <int> <dbl>
 1         1  55.2
 2         1  40.8
 3         1  55.7
 4         1  52.5
 5         1  54.5
 6         1  35.8
 7         1  44.5
 8         1  47.9
 9         1  40.8
10         1  37.4
# ... with 99,990 more rows

3. Calculate the mean for each replicate

bootstrap_distribution_mean_age <- congress_age_100 %>%
  specify(response = age) %>%
  generate(reps = 1000, type = "bootstrap") %>%
  calculate(stat = "mean")

  bootstrap_distribution_mean_age
Response: age (numeric)
# A tibble: 1,000 x 2
   replicate  stat
       <int> <dbl>
 1         1  51.3
 2         2  48.2
 3         3  49.7
 4         4  50.5
 5         5  51.6
 6         6  47.9
 7         7  49.5
 8         8  50.0
 9         9  51.0
10        10  51.0
# ... with 990 more rows

The bootstrap_distribution_mean_age has 1000 means


4. visualize the bootstrap distribution

visualize(bootstrap_distribution_mean_age)

Calculate the 95% confidence interval using the percentile method

congress_ci_percentile <- bootstrap_distribution_mean_age %>%
  get_confidence_interval(type = "percentile", level = 0.95)

congress_ci_percentile
# A tibble: 1 x 2
  lower_ci upper_ci
     <dbl>    <dbl>
1     48.5     52.7
obs_mean_age <- congress_age_100 %>%
  specify(response = age) %>%
  calculate(stat = "mean") %>%
  pull()

obs_mean_age
[1] 50.533
visualize(bootstrap_distribution_mean_age) +
  shade_confidence_interval(endpoints = congress_ci_percentile) +
  geom_vline(xintercept = obs_mean_age, color = "hotpink", size = 1 )

pop_mean_age <- congress_age %>%
  summarize(pop_mean= mean(age)) %>% pull()

pop_mean_age
[1] 53.31373
visualize(bootstrap_distribution_mean_age) +
  shade_confidence_interval(endpoints = congress_ci_percentile) +
  geom_vline(xintercept = obs_mean_age, color = "hotpink", size = 1 ) +
  geom_vline(xintercept = pop_mean_age, color = "purple", size = 3)

ggsave(filename = "preview.png",
  path = here::here("_posts", "2022-04-25-bootstrapping"))