πŸƒ Inference for a Single Mean

β€œThe more I love humanity in general, the less I love man in particular. ― Fyodor Dostoyevsky, The Brothers Karamazov

t.test
Linear Model
Inference
Bootstrap
Null Distributions
Generating Parallel Worlds
Author

Arvind V

Published

November 10, 2022

Modified

June 21, 2024

Abstract
Inference Tests to check the significance of a single Mean

Setting up R packages

knitr::opts_chunk$set(echo = TRUE,message = TRUE,warning = TRUE, fig.align = "center")
options(digits=2)
library(tidyverse)
library(mosaic)
library(infer)
### Dataset from Chihara and Hesterberg's book (Second Edition)
library(resampledata)
library(openintro) # datasets
library(explore) # New, Easy package for Stats Test and Viz, and other things

Introduction

We saw from the diagram created by Allen Downey that there is only one test 1! We will now use this philosophy to develop a technique that allows us to mechanize several Statistical Models in that way, with nearly identical code.

We will use two packages in R, mosaic to develop our intuition for what are called bootstrap randomization based statistical tests. (There is also a more recent package called infer in R which can do pretty much all of this, including visualization. In my opinion, the code is a little too high-level and does not offer quite the detailed insight that the mosaic package does).

Case Study #1: Toy data

First we will use a toy dataset with three β€œimaginary” samples, \(x, y, y2\). Each is normally distributed and made up of 50 observations.

We start by creating a function that will allow us to produce samples of a given size (N) with a specified mean (mu) and standard deviation (sd).

rnorm_fixed  <- function(N, mu = 0, sd = 1) {
  scale(rnorm(N))* sd + mu
}

We create three variables: x ( explanatory) and y, y2 ( dependent ).

set.seed(40) # for replication

# Data as vectors ( for t.tests etc)
x <- rnorm_fixed(50, mu = 0.0, sd = 1) #explanatory
y <- rnorm_fixed(50, mu = 0.3, sd = 2) # dependent #1
y2 <- rnorm_fixed(50, mu = 0.5, sd = 1.5) # dependent #2

# Make a tibble with all variables
mydata_wide <- tibble(x = x, y = y, y2 = y2)

# Long form data
mydata_long <- 
  mydata_wide %>%
  pivot_longer(., cols = c(x,y,y2), 
               names_to = "group", 
               values_to = "value")

# Long form data with only dependent variables
mydata_long_y <- 
  mydata_wide %>% 
  select(-x) %>% 
  pivot_longer(., cols = c(y,y2), 
               names_to = "group", 
               values_to = "value")
mydata_wide
mydata_long
mydata_long_y

β€œSigned Rank” Values

Most statistical tests use the actual values of the data variables. However, in some non-parametric statistical tests, the data are used in rank-transformed sense/order. In some cases the signed-rank of the data values is used instead of the data itself.

Signed Rank is calculated as follows:
1. Take the absolute value of each observation in a sample
2. Place the ranks in order of (absolute magnitude). The smallest number has rank = 1 and so on.
3. Give each of the ranks the sign of the original observation ( + or - )

signed_rank <- function(x) {sign(x) * rank(abs(x))}

Introduction to Inference for a Single Mean

A series of tests deal with one mean value of a sample. The idea is to evaluate whether that mean is representative of the mean of the underlying population. Depending upon the nature of the (single) variable, the test that can be used are as follows:

flowchart TD
    A[Inference for Single Mean] -->|Check Assumptions| B[Normality: Shapiro-Wilk Test shapiro.test\n]
    B --> C{OK?}
    C -->|Yes\n Parametric| D[t.test]
    D <-->F[Linear Model\n with Data] 
    C -->|No\n Non-Parametric| E[wilcox.test]
    E <--> G[Linear Model\n with\n Signed-Ranks of Data]
    C -->|No\n Non-Parametric| P[Bootstrap]
    P <--> Q[Linear Model\n with Signed-Rank\n with Bootstrap]
 

Inspecting and Charting Data

# Set graph theme
theme_set(new = theme_custom())
#
mydata_long %>% 
  gf_density(~ value, group = ~ group, fill = ~ group) %>% 
  gf_fitdistr(dist = "dnorm") %>% 
  gf_facet_wrap(vars(group)) %>% 
  gf_labs(title = "Densities of Original Data Variables",
          subtitle ="Compared with Normal Density")

Observations from Density Plots
  • All variables appear to be zero mean
  • \(x\) seems to have lower \(sd\) than the other two

Testing Assumptions in the Data

Inference

All the tests assert that the mean of y is not significantly different from zero.

Case Study #2: Exam data

Let us now choose a dataset from the openintro package:

data("exam_grades")
exam_grades

Inspecting and Charting Data

Testing Assumptions in the Data

Inference

Conclusion

TBW

References

  1. OpenIntro Modern Statistics, Chapter #17
R Package Citations
Package Version Citation
explore 1.3.0 Krasser (2024)
infer 1.0.7 Couch et al. (2021)
openintro 2.4.0 Γ‡etinkaya-Rundel et al. (2022)
resampledata 0.3.1 Chihara and Hesterberg (2018)
TeachHist 0.2.1 Lange (2023)
TeachingDemos 2.13 Snow (2024)
Γ‡etinkaya-Rundel, Mine, David Diez, Andrew Bray, Albert Y. Kim, Ben Baumer, Chester Ismay, Nick Paterno, and Christopher Barr. 2022. openintro: Data Sets and Supplemental Functions from β€œOpenIntro” Textbooks and Labs. https://CRAN.R-project.org/package=openintro.
Chihara, Laura M., and Tim C. Hesterberg. 2018. Mathematical Statistics with Resampling and r. 2nd ed. Hoboken, NJ: John Wiley & Sons. https://sites.google.com/site/chiharahesterberg/home.
Couch, Simon P., Andrew P. Bray, Chester Ismay, Evgeni Chasnovski, Benjamin S. Baumer, and Mine Γ‡etinkaya-Rundel. 2021. β€œinfer: An R Package for Tidyverse-Friendly Statistical Inference.” Journal of Open Source Software 6 (65): 3661. https://doi.org/10.21105/joss.03661.
Krasser, Roland. 2024. explore: Simplifies Exploratory Data Analysis. https://CRAN.R-project.org/package=explore.
Lange, Carsten. 2023. TeachHist: A Collection of Amended Histograms Designed for Teaching Statistics. https://CRAN.R-project.org/package=TeachHist.
Snow, Greg. 2024. TeachingDemos: Demonstrations for Teaching and Learning. https://CRAN.R-project.org/package=TeachingDemos.
Back to top

Footnotes

  1. https://allendowney.blogspot.com/2016/06/there-is-still-only-one-test.htmlβ†©οΈŽ

Citation

BibTeX citation:
@online{v2022,
  author = {V, Arvind},
  title = {πŸƒ {Inference} for a {Single} {Mean}},
  date = {2022-11-10},
  url = {https://av-quarto.netlify.app/content/courses/Analytics/Inference/Modules/100-OneMean/single-mean.html},
  langid = {en},
  abstract = {Inference Tests to check the significance of a single
    Mean}
}
For attribution, please cite this work as:
V, Arvind. 2022. β€œπŸƒ Inference for a Single Mean.” November 10, 2022. https://av-quarto.netlify.app/content/courses/Analytics/Inference/Modules/100-OneMean/single-mean.html.