πŸƒ Testing a Single Proportion

Monte Carlo Simulation
Random Number Generation
Generating Parallel Worlds

Arvind V


November 10, 2022


June 27, 2024

Inference Tests for the significance of a Proportion

Setting up R packages


## Datasets from Chihara and Hesterberg's book (Second Edition)

## Datasets from Cetinkaya-Rundel and Hardin's book (First Edition)


We saw from the diagram created by Allen Downey that there is only one test! We will now use this philosophy to develop a technique that allows us to mechanize several Statistical Models in that way, with nearly identical code.

We will use two packages in R, mosaic to develop our intuition for what are called permutation based statistical tests. (There is also a more recent package called infer in R which can do pretty much all of this, including visualization. In my opinion, the code is a little too high-level and does not offer quite the detailed insight that the mosaic package does).

Estimating a Single Proportion

Visualizing a Single Proportion

Hypothesis Testing for a Single Proportion


Uncertainty in Estimation


Permutation Visually Demonstrated

We will look visually at a permutation exercise. We will create dummy data that contains the following case study:

A set of identical resumes was sent to male and female evaluators. The candidates in the resumes were of both genders. We wish to see if there was difference in the way resumes were evaluated, by male and female evaluators. (We use just one male and one female evaluator here, to keep things simple!)


So, we have a solid disparity in percentage of selection between the two evaluators!


Now we pretend that there is no difference between the selections made by either set of evaluators. So we can just:

  • Pool up all the evaluations
  • Arbitrarily re-assign a given candidate(selected or rejected) to either of the two sets of evaluators, by permutation.

How would that pooled shuffled set of evaluations look like?


As can be seen, the ratio is different!

We can now check out our Hypothesis that there is no bias. We can shuffle the data many many times, calculating the ratio each time, and plot the distribution of the differences in selection ratio and see how that artificially created distribution compares with the originally observed figure from Mother Nature.

# Set graph theme
theme_set(new = theme_custom())
null_dist <- do(4999) * diff(mean(
  candidate_selected ~ shuffle(evaluator), 
  data = data))
# null_dist %>% names()
null_dist %>% gf_histogram( ~ M, 
                  fill = ~ (M <= obs_difference), 
                  bins = 25,show.legend = FALSE,
                  xlab = "Bias Proportion", 
                  ylab = "How Often?",
                  title = "Permutation Test on Difference between Groups",
                  subtitle = "") %>% 
  gf_vline(xintercept = ~ obs_difference, color = "red" ) %>% 
  gf_label(500 ~ obs_difference, label = "Observed\n Bias", 
           show.legend = FALSE) 
mean(~ M<= obs_difference, data = null_dist)


[1] 0.00220044

We see that the artificial data can hardly ever (\(p = 0.012\)) mimic what the real world experiment is showing. Hence we had good reason to reject our NULL Hypothesis that there is no bias.

Case Study #1: TBD

Case Study #2: Weight vs Exercise in the YRBSS Survey

An interactive app




  1. Mine Γ‡etinkaya-Rundel and Johanna Hardin, OpenIntro Modern Statistics: Chapter 17
  2. Laura M. Chihara, Tim C. Hesterberg, Mathematical Statistics with Resampling and R. 3 August 2018.Β© 2019 John Wiley & Sons, Inc.
  3. https://iconarray.com/download
R Package Citations
Package Version Citation
ggbrace 0.1.1 Huber (2024)
openintro 2.5.0 Γ‡etinkaya-Rundel et al. (2024)
resampledata 0.3.1 Chihara and Hesterberg (2018)
Γ‡etinkaya-Rundel, Mine, David Diez, Andrew Bray, Albert Y. Kim, Ben Baumer, Chester Ismay, Nick Paterno, and Christopher Barr. 2024. openintro: Datasets and Supplemental Functions from β€œOpenIntro” Textbooks and Labs. https://CRAN.R-project.org/package=openintro.
Chihara, Laura M., and Tim C. Hesterberg. 2018. Mathematical Statistics with Resampling and r. 2nd ed. Hoboken, NJ: John Wiley & Sons. https://sites.google.com/site/chiharahesterberg/home.
Huber, Nicolas. 2024. ggbrace: Curly Braces for β€œggplot2”. https://CRAN.R-project.org/package=ggbrace.
Back to top


BibTeX citation:
  author = {V, Arvind},
  title = {πŸƒ {Testing} a {Single} {Proportion}},
  date = {2022-11-10},
  url = {https://av-quarto.netlify.app/content/courses/Analytics/Inference/Modules/180-OneProp/},
  langid = {en},
  abstract = {Inference Tests for the significance of a Proportion}
For attribution, please cite this work as:
V, Arvind. 2022. β€œπŸƒ Testing a Single Proportion.” November 10, 2022. https://av-quarto.netlify.app/content/courses/Analytics/Inference/Modules/180-OneProp/.