π Testing a Single Proportion
Setting up R packages
Introduction
We saw from the diagram created by Allen Downey that there is only one test! We will now use this philosophy to develop a technique that allows us to mechanize several Statistical Models in that way, with nearly identical code.
We will use two packages in R, mosaic
to develop our intuition for what are called permutation based statistical tests. (There is also a more recent package called infer
in R which can do pretty much all of this, including visualization. In my opinion, the code is a little too high-level and does not offer quite the detailed insight that the mosaic
package does).
Estimating a Single Proportion
Visualizing a Single Proportion
Hypothesis Testing for a Single Proportion
Intent
Uncertainty in Estimation
Variance
Permutation Visually Demonstrated
We will look visually at a permutation exercise. We will create dummy data that contains the following case study:
A set of identical resumes was sent to male and female evaluators. The candidates in the resumes were of both genders. We wish to see if there was difference in the way resumes were evaluated, by male and female evaluators. (We use just one male and one female evaluator here, to keep things simple!)
M
-0.3333333
So, we have a solid disparity in percentage of selection between the two evaluators!
Permutation
Now we pretend that there is no difference between the selections made by either set of evaluators. So we can just:
- Pool up all the evaluations
- Arbitrarily re-assign a given candidate(selected or rejected) to either of the two sets of evaluators, by permutation.
How would that pooled shuffled set of evaluations look like?
As can be seen, the ratio is different!
We can now check out our Hypothesis that there is no bias. We can shuffle the data many many times, calculating the ratio each time, and plot the distribution of the differences in selection ratio and see how that artificially created distribution compares with the originally observed figure from Mother Nature.
# Set graph theme
theme_set(new = theme_custom())
#
null_dist <- do(4999) * diff(mean(
candidate_selected ~ shuffle(evaluator),
data = data))
# null_dist %>% names()
null_dist %>% gf_histogram( ~ M,
fill = ~ (M <= obs_difference),
bins = 25,show.legend = FALSE,
xlab = "Bias Proportion",
ylab = "How Often?",
title = "Permutation Test on Difference between Groups",
subtitle = "") %>%
gf_vline(xintercept = ~ obs_difference, color = "red" ) %>%
gf_label(500 ~ obs_difference, label = "Observed\n Bias",
show.legend = FALSE)
mean(~ M<= obs_difference, data = null_dist)
[1] 0.00220044
We see that the artificial data can hardly ever (\(p = 0.012\)) mimic what the real world experiment is showing. Hence we had good reason to reject our NULL Hypothesis that there is no bias.
Case Study #1: TBD
Case Study #2: Weight vs Exercise in the YRBSS Survey
An interactive app
Conclusion
References
- Mine Γetinkaya-Rundel and Johanna Hardin, OpenIntro Modern Statistics: Chapter 17
- Laura M. Chihara, Tim C. Hesterberg, Mathematical Statistics with Resampling and R. 3 August 2018.Β© 2019 John Wiley & Sons, Inc.
- https://iconarray.com/download
R Package Citations
Citation
@online{v2022,
author = {V, Arvind},
title = {π {Testing} a {Single} {Proportion}},
date = {2022-11-10},
url = {https://av-quarto.netlify.app/content/courses/Analytics/Inference/Modules/180-OneProp/},
langid = {en},
abstract = {Inference Tests for the significance of a Proportion}
}