🃏 Testing a Single Proportion

Permutation

Monte Carlo Simulation

Random Number Generation

Distributions

Generating Parallel Worlds

Author

Arvind V

Published

November 10, 2022

Modified

May 21, 2024

Abstract

Inference Tests for the significance of a Proportion

Setting up R packages

library(tidyverse)
library(mosaic)
library(ggformula)

## Datasets from Chihara and Hesterberg's book (Second Edition)
library(resampledata)

## Datasets from Cetinkaya-Rundel and Hardin's book (First Edition)
library(openintro)

Introduction

We saw from the diagram created by Allen Downey that there is only one test! We will now use this philosophy to develop a technique that allows us to mechanize several Statistical Models in that way, with nearly identical code.

We will use two packages in R, mosaic to develop our intuition for what are called permutation based statistical tests. (There is also a more recent package called infer in R which can do pretty much all of this, including visualization. In my opinion, the code is a little too high-level and does not offer quite the detailed insight that the mosaic package does).

Estimating a Single Proportion

Visualizing a Single Proportion

Hypothesis Testing for a Single Proportion

Intent

Uncertainty in Estimation

Variance

Permutation Visually Demonstrated

We will look visually at a permutation exercise. We will create dummy data that contains the following case study:

A set of identical resumes was sent to male and female evaluators. The candidates in the resumes were of both genders. We wish to see if there was difference in the way resumes were evaluated, by male and female evaluators. (We use just one male and one female evaluator here, to keep things simple!)

         M 
-0.3333333

So, we have a solid disparity in percentage of selection between the two evaluators!

Permutation

Now we pretend that there is no difference between the selections made by either set of evaluators. So we can just:

Pool up all the evaluations
Arbitrarily re-assign a given candidate(selected or rejected) to either of the two sets of evaluators, by permutation.

How would that pooled shuffled set of evaluations look like?

As can be seen, the ratio is different!

We can now check out our Hypothesis that there is no bias. We can shuffle the data many many times, calculating the ratio each time, and plot the distribution of the differences in selection ratio and see how that artificially created distribution compares with the originally observed figure from Mother Nature.

# Set graph theme
theme_set(new = theme_custom())
#
null_dist <- do(4999) * diff(mean(
  candidate_selected ~ shuffle(evaluator), 
  data = data))
# null_dist %>% names()
null_dist %>% gf_histogram( ~ M, 
                  fill = ~ (M <= obs_difference), 
                  bins = 25,show.legend = FALSE,
                  xlab = "Bias Proportion", 
                  ylab = "How Often?",
                  title = "Permutation Test on Difference between Groups",
                  subtitle = "") %>% 
  gf_vline(xintercept = ~ obs_difference, color = "red" ) %>% 
  gf_label(500 ~ obs_difference, label = "Observed\n Bias", 
           show.legend = FALSE) 
mean(~ M<= obs_difference, data = null_dist)

[1] 0.00220044

We see that the artificial data can hardly ever (\(p = 0.012\)) mimic what the real world experiment is showing. Hence we had good reason to reject our NULL Hypothesis that there is no bias.

Case Study #1: TBD

Case Study #2: Weight vs Exercise in the YRBSS Survey

An interactive app

https://openintro.shinyapps.io/CLT_prop/

Conclusion

References

Mine Çetinkaya-Rundel and Johanna Hardin, OpenIntro Modern Statistics: Chapter 17
Laura M. Chihara, Tim C. Hesterberg, Mathematical Statistics with Resampling and R. 3 August 2018.© 2019 John Wiley & Sons, Inc.
https://iconarray.com/download

R Package Citations

Package	Version	Citation
ggbrace	0.1.1	Huber (2024)
openintro	2.4.0	Çetinkaya-Rundel et al. (2022)
resampledata	0.3.1	Chihara and Hesterberg (2018)

Çetinkaya-Rundel, Mine, David Diez, Andrew Bray, Albert Y. Kim, Ben Baumer, Chester Ismay, Nick Paterno, and Christopher Barr. 2022. openintro: Data Sets and Supplemental Functions from “OpenIntro” Textbooks and Labs. https://CRAN.R-project.org/package=openintro.

Chihara, Laura M., and Tim C. Hesterberg. 2018. Mathematical Statistics with Resampling and r. 2nd ed. Hoboken, NJ: John Wiley & Sons. https://sites.google.com/site/chiharahesterberg/home.

Huber, Nicolas. 2024. ggbrace: Curly Braces for “ggplot2”. https://CRAN.R-project.org/package=ggbrace.

Citation

BibTeX citation:

@online{v2022,
  author = {V, Arvind},
  title = {🃏 {Testing} a {Single} {Proportion}},
  date = {2022-11-10},
  url = {https://av-quarto.netlify.app/content/courses/Analytics/Inference/Modules/180-OneProp/single-prop.html},
  langid = {en},
  abstract = {Inference Tests for the significance of a Proportion}
}

For attribution, please cite this work as:

V, Arvind. 2022. “🃏 Testing a Single Proportion.” November 10, 2022. https://av-quarto.netlify.app/content/courses/Analytics/Inference/Modules/180-OneProp/single-prop.html.