Applied Metaphors: Learning TRIZ, Complexity, Data/Stats/ML using Metaphors
  1. Teaching
  2. Data Analytics for Managers and Creators
  3. Statistical Inference
  4. ๐Ÿƒ Inference for a Single Mean
  • Teaching
    • Data Analytics for Managers and Creators
      • Tools
        • Introduction to R and RStudio
        • Introduction to Radiant
        • Introduction to Orange
      • Descriptive Analytics
        • Data
        • Summaries
        • Counts
        • Quantities
        • Groups
        • Densities
        • Groups and Densities
        • Change
        • Proportions
        • Parts of a Whole
        • Evolution and Flow
        • Ratings and Rankings
        • Surveys
        • Time
        • Space
        • Networks
        • Experiments
        • Miscellaneous Graphing Tools, and References
      • Statistical Inference
        • ๐Ÿงญ Basics of Statistical Inference
        • ๐ŸŽฒ Samples, Populations, Statistics and Inference
        • Basics of Randomization Tests
        • ๐Ÿƒ Inference for a Single Mean
        • ๐Ÿƒ Inference for Two Independent Means
        • ๐Ÿƒ Inference for Comparing Two Paired Means
        • Comparing Multiple Means with ANOVA
        • Inference for Correlation
        • ๐Ÿƒ Testing a Single Proportion
        • ๐Ÿƒ Inference Test for Two Proportions
      • Inferential Modelling
        • Modelling with Linear Regression
        • Modelling with Logistic Regression
        • ๐Ÿ•” Modelling and Predicting Time Series
      • Predictive Modelling
        • ๐Ÿ‰ Intro to Orange
        • ML - Regression
        • ML - Classification
        • ML - Clustering
      • Prescriptive Modelling
        • ๐Ÿ“ Intro to Linear Programming
        • ๐Ÿ’ญ The Simplex Method - Intuitively
        • ๐Ÿ“… The Simplex Method - In Excel
      • Workflow
        • Facing the Abyss
        • I Publish, therefore I Am
      • Case Studies
        • Demo:Product Packaging and Elderly People
        • Ikea Furniture
        • Movie Profits
        • Gender at the Work Place
        • Heptathlon
        • School Scores
        • Children's Games
        • Valentineโ€™s Day Spending
        • Women Live Longer?
        • Hearing Loss in Children
        • California Transit Payments
        • Seaweed Nutrients
        • Coffee Flavours
        • Legionnaireโ€™s Disease in the USA
        • Antarctic Sea ice
        • William Farr's Observations on Cholera in London
    • R for Artists and Managers
      • ๐Ÿ•ถ Lab-1: Science, Human Experience, Experiments, and Data
      • Lab-2: Down the R-abbit Holeโ€ฆ
      • Lab-3: Drink Me!
      • Lab-4: I say what I mean and I mean what I say
      • Lab-5: Twas brillig, and the slithy tovesโ€ฆ
      • Lab-6: These Roses have been Painted !!
      • Lab-7: The Lobster Quadrille
      • Lab-8: Did you ever see such a thing as a drawing of a muchness?
      • Lab-9: If you please sirโ€ฆwhich way to the Secret Garden?
      • Lab-10: An Invitation from the Queenโ€ฆto play Croquet
      • Lab-11: The Queen of Hearts, She Made some Tarts
      • Lab-12: Time is a Him!!
      • Iteration: Learning to purrr
      • Lab-13: Old Tortoise Taught Us
      • Lab-14: Youโ€™re are Nothing but a Pack of Cards!!
    • ML for Artists and Managers
      • ๐Ÿ‰ Intro to Orange
      • ML - Regression
      • ML - Classification
      • ML - Clustering
      • ๐Ÿ•” Modelling Time Series
    • TRIZ for Problem Solvers
      • I am Water
      • I am What I yam
      • Birds of Different Feathers
      • I Connect therefore I am
      • I Think, Therefore I am
      • The Art of Parallel Thinking
      • A Year of Metaphoric Thinking
      • TRIZ - Problems and Contradictions
      • TRIZ - The Unreasonable Effectiveness of Available Resources
      • TRIZ - The Ideal Final Result
      • TRIZ - A Contradictory Language
      • TRIZ - The Contradiction Matrix Workflow
      • TRIZ - The Laws of Evolution
      • TRIZ - Substance Field Analysis, and ARIZ
    • Math Models for Creative Coders
      • Maths Basics
        • Vectors
        • Matrix Algebra Whirlwind Tour
        • content/courses/MathModelsDesign/Modules/05-Maths/70-MultiDimensionGeometry/index.qmd
      • Tech
        • Tools and Installation
        • Adding Libraries to p5.js
        • Using Constructor Objects in p5.js
      • Geometry
        • Circles
        • Complex Numbers
        • Fractals
        • Affine Transformation Fractals
        • L-Systems
        • Kolams and Lusona
      • Media
        • Fourier Series
        • Additive Sound Synthesis
        • Making Noise Predictably
        • The Karplus-Strong Guitar Algorithm
      • AI
        • Working with Neural Nets
        • The Perceptron
        • The Multilayer Perceptron
        • MLPs and Backpropagation
        • Gradient Descent
      • Projects
        • Projects
    • Data Science with No Code
      • Data
      • Orange
      • Summaries
      • Counts
      • Quantity
      • ๐Ÿ•ถ Happy Data are all Alike
      • Groups
      • Change
      • Rhythm
      • Proportions
      • Flow
      • Structure
      • Ranking
      • Space
      • Time
      • Networks
      • Surveys
      • Experiments
    • Tech for Creative Education
      • ๐Ÿงญ Using Idyll
      • ๐Ÿงญ Using Apparatus
      • ๐Ÿงญ Using g9.js
    • Literary Jukebox: In Short, the World
      • Italy - Dino Buzzati
      • France - Guy de Maupassant
      • Japan - Hisaye Yamamoto
      • Peru - Ventura Garcia Calderon
      • Russia - Maxim Gorky
      • Egypt - Alifa Rifaat
      • Brazil - Clarice Lispector
      • England - V S Pritchett
      • Russia - Ivan Bunin
      • Czechia - Milan Kundera
      • Sweden - Lars Gustaffsson
      • Canada - John Cheever
      • Ireland - William Trevor
      • USA - Raymond Carver
      • Italy - Primo Levi
      • India - Ruth Prawer Jhabvala
      • USA - Carson McCullers
      • Zimbabwe - Petina Gappah
      • India - Bharati Mukherjee
      • USA - Lucia Berlin
      • USA - Grace Paley
      • England - Angela Carter
      • USA - Kurt Vonnegut
      • Spain-Merce Rodoreda
      • Israel - Ruth Calderon
      • Israel - Etgar Keret
  • Posts
  • Blogs and Talks

On this page

  • Setting up R packages
  • Introduction
  • Statistical Inference is almost an Attitude!
  • Case Study #1: Toy data
  • Inspecting and Charting Data
    • Research Question
    • Assumptions
  • Inference
  • Case Study #2: Exam data
    • Research Question
    • Inspecting and Charting Data
    • Testing Assumptions in the Data
  • Inference
  • Workflow for Inference for a Single Mean
  • Wait, But Why?
  • Conclusion
  • References
  • Additional Readings
  1. Teaching
  2. Data Analytics for Managers and Creators
  3. Statistical Inference
  4. ๐Ÿƒ Inference for a Single Mean

๐Ÿƒ Inference for a Single Mean

โ€œThe more I love humanity in general, the less I love man in particular. โ€• Fyodor Dostoyevsky, The Brothers Karamazov

t.test
Inference
Bootstrap
Null Distributions
Generating Parallel Worlds
Author

Arvind V.

Published

November 10, 2022

Modified

December 12, 2024

Abstract
Inference Tests for a Single population Mean

โ€ฆneither let us despair over how small our successes are. For however much our successes fall short of our desire, our efforts arenโ€™t in vain when we are farther along today than yesterday.

โ€” John Calvin

Setting up R packages

library(tidyverse)
library(mosaic)
library(ggformula)
library(infer)
library(broom) # Clean test results in tibble form
library(resampledata) # Datasets from Chihara and Hesterberg's book
library(openintro) # More datasets

Plot Theme

Show the Code
# https://stackoverflow.com/questions/74491138/ggplot-custom-fonts-not-working-in-quarto

# Chunk options
knitr::opts_chunk$set(
  fig.width = 7,
  fig.asp = 0.618, # Golden Ratio
  # out.width = "80%",
  fig.align = "center", tidy = TRUE
)
### Ggplot Theme
### https://rpubs.com/mclaire19/ggplot2-custom-themes

theme_custom <- function() {
  font <- "Roboto Condensed" # assign font family up front

  theme_classic(base_size = 14) %+replace% # replace elements we want to change

    theme(
      text = element_text(family = font),
      panel.grid.minor = element_blank(), # strip minor gridlines

      # text elements
      plot.title = element_text( # title
        family = font, # set font family
        # size = 20,               #set font size
        face = "bold", # bold typeface
        hjust = 0, # left align
        # vjust = 2                #raise slightly
        margin = margin(0, 0, 10, 0)
      ),
      plot.subtitle = element_text( # subtitle
        family = font, # font family
        # size = 14,                #font size
        hjust = 0,
        margin = margin(2, 0, 5, 0)
      ),
      plot.caption = element_text( # caption
        family = font, # font family
        size = 8, # font size
        hjust = 1
      ), # right align

      axis.title = element_text( # axis titles
        family = font, # font family
        size = 10 # font size
      ),
      axis.text = element_text( # axis text
        family = font, # axis family
        size = 8
      ) # font size
    )
}

# Set graph theme
theme_set(new = theme_custom())
#

Introduction

In this module, we will answer a basic Question: What is the mean ฮผ of the population?

Recall that the mean is the first of our Summary Statistics. We wish to know more about the mean of the population from which we have drawn our data sample.

We will do this is in several ways, based on the assumptions we are willing to adopt about our data. First we will use a toy dataset with one โ€œimaginaryโ€ sample, normally distributed and made up of 50 observations. Since we โ€œknow the answerโ€ we will be able to build up some belief in the tests and procedures, which we will dig into to form our intuitions.

We will then use a real-world dataset to make inferences on the means of Quant variables therein, and decide what that could tell us.

Statistical Inference is almost an Attitude!

As we will notice, the process of Statistical Inference is an attitude: ainโ€™t nothing happeninโ€™! We look at data that we might have received or collected ourselves, and look at it with this attitude, seemingly, of some disbelief. We state either that:

  1. there is really nothing happening with our research question, and that anything we see in the data is the outcome of random chance.
  2. the value/statistic indicated by the data is off the mark and ought to be something else.

We then calculate how slim the chances are of the given data sample showing up like that, given our belief. It is a distance measurement of sorts. If those chances are too low, then that might alter our belief. This is the attitude that lies at the heart of Hypothesis Testing.

Important

The calculation of chances is both a logical, and a possible procedure since we are dealing with samples from a population. If many other samples give us quite different estimates, then we would discredit the one we derive from it.

Each test we perform will mechanize this attitude in different ways, based on assumptions and conveniences. (And history)

Case Study #1: Toy data

Since the CLT assumes the sample is normally-distributed, let us generate a sample that is just so:

set.seed(40) # for replication
#
# Data as individual vectors
# ( for t.tests etc)
# Generate normally distributed data with mean = 2, sd = 2, length = 50
y <- rnorm(n = 50, mean = 2, sd = 2)

# And as tibble too
mydata <- tibble(y = y)
mydata
ABCDEFGHIJ0123456789
y
<dbl>
2.95547807
2.99236565
0.28083140
0.34188009
1.35685383
-0.60754080
-0.84297320
5.48982989
1.42344128
-0.61773144
Next
123
...
5
Previous
1-10 of 50 rows

Inspecting and Charting Data

# Set graph theme
theme_set(new = theme_custom())
#
mydata %>%
  gf_density(~y) %>%
  gf_fitdistr(dist = "dnorm") %>%
  gf_labs(
    title = "Densities of Original Data Variables",
    subtitle = "Compared with Normal Density"
  )

NoteObservations from Density Plots
  • The variable y appear to be centred around
  • It does not seem to be normally distributedโ€ฆ
  • So assumptions are not always validโ€ฆ

Research Question

Research Questions are always about the population! Here goes:

NoteResearch Question

Could the mean of the population ฮผ, from which y has been drawn, be zero?

Assumptions

NoteTesting for Normality

The y-variable does not appear to be normally distributed. This would affect the test we can use to make inferences about the population mean.
There are formal tests for normality too. We will do them in the next case study. For now, let us proceed naively.

Inference

  • The t-test
  • Wilcoxonโ€™s Signed-Rank Test
  • Using Permutation and Bootstrap
  • Intuitive

A. Model

We have mean(y)=yยฏ. We formulate โ€œour disbeliefโ€ of yยฏ with a NULL Hypothesis, about the population as follows:

 H0:ฮผ=0 And the alternative hypothesis, again about the population as

Ha:ฮผโ‰ 0

B. Code

# t-test
t1 <- mosaic::t_test(
  y, # Name of variable
  mu = 0, # belief of population mean
  alternative = "two.sided"
) %>% # Check both sides

  broom::tidy() # Make results presentable, and plottable!!
t1
ABCDEFGHIJ0123456789
estimate
<dbl>
statistic
<dbl>
p.value
<dbl>
parameter
<dbl>
conf.low
<dbl>
conf.high
<dbl>
method
<chr>
alternative
<chr>
2.0456896.7855961.425495e-08491.4398522.651526One Sample t-testtwo.sided
1 row
ImportantRecall Confidence Intervals

Recall how we calculated means, standard deviations from data (samples). If we could measure the entire population, then there would be no uncertainty in our estimates for means and sd-s. Since we are forced to sample, we can only estimate population parameters based on the sample estimates and state how much off we might be.

Confidence intervals for population means are given by:

CI=yยฏ ยฑ constantโˆ—StandardError=yยฏ ยฑ 1.96โˆ—sd/n

Assuming the y is normally-distributed, the constant=1.96 for confidence level of 95%. What that means is that if we take multiple such samples like y from the population, their means (which are random) will land within CI of the population mean (which is fixed!) 95% of the time. Uffโ€ฆ! May remind you of Buffonโ€™s Needleโ€ฆ

If X ~ N(2, 2), then 
    P(X <= -1.919928) = 0.025   P(X <=  5.919928) = 0.975
    P(X >  -1.919928) = 0.975   P(X >   5.919928) = 0.025

So yยฏ i.e. the estimate is 2.045689. The confidence intervals do not straddle zero. The chances that this particular value of mean (2.045689) would randomly occur under the assumption that ฮผ is zero, are exceedingly slim, p.value=1.425495eโˆ’08. Hence we can reject the NULL hypothesis that the true population, of which y is a sample, could have mean ฮผ=0.

โ€œSigned Rankโ€ Values: A Small Digression

When the Quant variable we want to test for is not normally distributed, we need to think of other ways to perform our inference. Our assumption about normality has been invalidated.
Most statistical tests use the actual values of the data variables. However, in these cases where assumptions are invalidated, the data are used in rank-transformed sense/order. In some cases the signed-rank of the data values is used instead of the data itself. The signed ranks are then tested to see if there are more of one polarity than the other, roughly speaking, and how probable this could be.

Signed Rank is calculated as follows:

  1. Take the absolute value of each observation in a sample
  2. Place the ranks in order of (absolute magnitude). The smallest number has rank = 1 and so on.
  3. Give each of the ranks the sign of the original observation ( + or -)
signed_rank <- function(x) {
  sign(x) * rank(abs(x))
}

Since we are dealing with the mean, the sign of the rank becomes important to use.

A. Model

mean(signed_rank(y))=ฮฒ0

H0:ฮผ0=0 Ha:ฮผ0โ‰ 0

B. Code

# Standard Wilcoxon Signed_Rank Test
t2 <- wilcox.test(y, # variable name
  mu = 0, # belief
  alternative = "two.sided",
  conf.int = TRUE,
  conf.level = 0.95
) %>%
  broom::tidy()
t2
ABCDEFGHIJ0123456789
estimate
<dbl>
statistic
<dbl>
p.value
<dbl>
conf.low
<dbl>
conf.high
<dbl>
2.0453311441.036606e-061.3832052.721736
1 row | 1-5 of 7 columns
# Can also do this equivalently
# t-test with signed_rank data
t3 <- t.test(signed_rank(y),
  mu = 0,
  alternative = "two.sided",
  conf.int = TRUE,
  conf.level = 0.95
) %>%
  broom::tidy()
t3
ABCDEFGHIJ0123456789
estimate
<dbl>
statistic
<dbl>
p.value
<dbl>
parameter
<dbl>
conf.low
<dbl>
20.266.7001231.933926e-084914.1834
1 row | 1-5 of 8 columns

Again, the confidence intervals do not straddle 0, and we need to reject the belief that the mean is close to zero.

Note

Note how the Wilcoxon Test reports results about y, even though it computes with signedโˆ’rank(y). The โ€œequivalent t-testโ€ with signed-rank data cannot do this, since it uses โ€œrankโ€ data, and reports the same result.

We saw from the diagram created by Allen Downey that there is only one test 1! We will now use this philosophy to develop a technique that allows us to mechanize several Statistical Models in that way, with nearly identical code.

We can use two packages in R, mosaic to develop our intuition for what are called permutation based statistical tests; and a more recent package called infer in R which can do pretty much all of this, including visualization.

We will stick with mosaic for now. We will do a permutation test first, and then a bootstrap test. In subsequent modules, we will use infer also.

For the Permutation test, we mechanize our belief that ฮผ=0 by shuffling the polarities of the y observations randomly 4999 times to generate other samples from the population y could have come from2. If these samples can frequently achieve yiยฏโ‰ค0, then we might believe that the population mean may be 0!

We see that the means here that chances that the randomly generated means can exceed our real-world mean are about 0! So the mean is definitely different from 0.

# Set graph theme
theme_set(new = theme_custom())
#
# Calculate exact mean
obs_mean <- mean(~y, data = mydata)
belief1 <- 0 # What we think the mean is
obs_diff_mosaic <- obs_mean - belief1
obs_diff_mosaic
[1] 2.045689
## Steps in Permutation Test
## Repeatedly Shuffle polarities of data observations
## Take means
## Compare all means with the real-world observed one
null_dist_mosaic <-
  mosaic::do(9999) * mean(
    ~ abs(y) *
      sample(c(-1, 1), # +/- 1s multiply y
        length(y), # How many +/- 1s?
        replace = T
      ), # select with replacement
    data = mydata
  )
##
range(null_dist_mosaic$mean)
[1] -1.754293  1.473298
##
## Plot this NULL distribution
gf_histogram(
  ~mean,
  data = null_dist_mosaic,
  fill = ~ (mean >= obs_diff_mosaic),
  bins = 50, title = "Distribution of Permutation Means under Null Hypothesis",
  subtitle = "Why is the mean of the means zero??"
) %>%
  gf_labs(
    x = "Calculated Random Means",
    y = "How Often do these occur?"
  ) %>%
  gf_vline(xintercept = obs_diff_mosaic, colour = "red")

# p-value
# Null distributions are always centered around zero. Why?
prop(~ mean >= obs_diff_mosaic,
  data = null_dist_mosaic
)
prop_TRUE 
        0 

Let us try the bootstrap test now: Here we simulate samples, similar to the one at hand, using repeated sampling the sample itself, with replacement, a process known as bootstrapping, or bootstrap sampling.

# Set graph theme
theme_set(new = theme_custom())
##
## Resample with replacement from the one sample of 50
## Calculate the mean each time
null_toy_bs <- mosaic::do(4999) *
  mean(
    ~ sample(y,
      replace = T
    ), # select with replacement
    data = mydata
  )

## Plot this NULL distribution
gf_histogram(
  ~mean,
  data = null_toy_bs,
  bins = 50,
  title = "Distribution of Bootstrap Means"
) %>%
  gf_labs(
    x = "Calculated Random Means",
    y = "How Often do these occur?"
  ) %>%
  gf_vline(xintercept = ~belief1, colour = "red")

prop(~ mean >= belief1,
  data = null_toy_bs
) +
  prop(~ mean <= -belief1,
    data = null_toy_bs
  )
prop_TRUE 
        1 
NotePermutation vs Bootstrap

There is a difference between the two. The bootstrap test uses the sample at hand to generate many similar samples without access to the population, and calculates the statistic needed (i.e. mean). No Hypothesis is stated. The distribution of bootstrap samples looks โ€œsimilarโ€ to that we might obtain by repeatedly sampling the population itself. (centred around a population parameter, i.e. ฮผ)

The permutation test generates many permutations of the data and generates appropriates measures/statistics under the NULL hypothesis. Which is why the permutation test has a NULL distribution centered at 0 in this case, our NULL hypothesis.

As student Sneha Manu Jacob remarked in class, Permutation flips the signs of the data values in our sample; Bootstrap flips the number of times each data value is (re)used. Good Insight!!

Yes, the t-test works, but what is really happening under the hood of the t-test? The inner mechanism of the t-test can be stated in the following steps:

  1. Calculate the mean of the sample yยฏ.
  2. Calculate the sd of the sample, and, assuming the sample is normally distributed, calculate the standard error (i.e. sdn)
  3. Take the difference between the sample mean yยฏ and our expected/believed population mean ฮผ.
  4. We expect that the population mean ought to be within the confidence interval of the sample mean yยฏ.
  5. For a normally distributed sample, the confidence interval is given by ยฑ1.96โˆ—standarderror, to be 95% sure that the sample mean is a good estimate for the population mean.
  6. Therefore if the difference between actual and believed is far beyond the confidence interval, hmmโ€ฆwe cannot think our belief is correct and we change our opinion.

Let us translate that mouthful into calculations!

mean_belief_pop <- 0.0 # Assert our belief
# Sample Mean
mean_y <- mean(y)
mean_y
[1] 2.045689
## Sample standard error
std_error <- sd(y) / sqrt(length(y))
std_error
[1] 0.3014752
## Confidence Interval of Observed Mean
conf_int <- tibble(ci_low = mean_y - 1.96 * std_error, ci_high = mean_y + 1.96 * std_error)
conf_int
ABCDEFGHIJ0123456789
ci_low
<dbl>
ci_high
<dbl>
1.4547982.63658
1 row
## Difference between actual and believed mean
mean_diff <- mean_y - mean_belief_pop
mean_diff
[1] 2.045689
## Test Statistic
t <- mean_diff / std_error
t
[1] 6.785596

We see that the difference between means is 6.78 times the std_error! At a distance of 1.96 (either way) the probability of this data happening by chance already drops to 5%!! At this distance of 6.78, we would have negligible probability of this data occurring by chance!

How can we visualize this?

If X ~ N(2.046, 0.3015), then 
    P(X <= 1.443e-07) = P(Z <= -6.786) = 5.78e-12
    P(X >  1.443e-07) = P(Z >  -6.786) = 1

[1] 5.780412e-12

Case Study #2: Exam data

Let us now choose a dataset from the openintro package:

data("exam_grades")
exam_grades
ABCDEFGHIJ0123456789
semester
<chr>
sex
<chr>
exam1
<dbl>
exam2
<dbl>
exam3
<dbl>
course_grade
<dbl>
2000-1Man84.500069.586.500076.2564
2000-1Man80.000074.067.000075.3882
2000-1Man56.000070.071.500067.0564
2000-1Man64.000061.067.500063.4538
2000-1Man90.500072.575.000072.3949
2000-1Man74.000078.584.500071.4128
2000-1Man60.500044.058.000056.0949
2000-1Man89.000082.088.000078.0103
2000-1Woman87.500086.595.000082.9026
2000-1Man91.000098.088.000089.0846
Next
123456
...
24
Previous
1-10 of 233 rows

Research Question

There are quite a few Quant variables in the data. Let us choose course_grade as our variable of interest. What might we wish to find out?

NoteResearch Question

In general, the Teacher in this class is overly generous with grades unlike others we know of, and so the average course-grade is equal to 80% !!

Inspecting and Charting Data

# Set graph theme
theme_set(new = theme_custom())
#
exam_grades %>%
  gf_density(~course_grade) %>%
  gf_fitdistr(dist = "dnorm") %>%
  gf_labs(
    title = "Density of Course Grade",
    subtitle = "Compared with Normal Density"
  )

Hmmโ€ฆdata looks normally distributed. But this time we will not merely trust our eyes, but do a test for it.

Testing Assumptions in the Data

NoteIs the data normally distributed?
stats::shapiro.test(x = exam_grades$course_grade) %>%
  broom::tidy()
ABCDEFGHIJ0123456789
statistic
<dbl>
p.value
<dbl>
method
<chr>
0.99394530.470688Shapiro-Wilk normality test
1 row

The Shapiro-Wilkes Test tests whether a data variable is normally distributed or not. Without digging into the maths of it, let us say it makes the assumption that the variable is so distributed and then computes the probability of how likely this is. So a high p-value (0.47) is a good thing here.

When we have large Quant variables ( i.e. with length >= 5000), the shapiro.test does not work, and we use an Anderson-Darling3 test to confirm normality:

library(nortest)
# Especially when we have >= 5000 observations
nortest::ad.test(x = exam_grades$course_grade) %>%
  broom::tidy()
ABCDEFGHIJ0123456789
statistic
<dbl>
p.value
<dbl>
method
<chr>
0.33065550.5118521Anderson-Darling normality test
1 row

So course_grade is a normally-distributed variable. There are no exceptional students! Hmph!

Inference

  • t.test
  • Wilcoxon test
  • Using Permutation and Bootstrap

A. Model

We have that mean(course_grade)=ฮฒ0. As before, we formulate โ€œour (dis)beliefโ€ in this sample mean with a NULL Hypothesis about the population, as follows:

 H0:ฮผ=80

Ha:ฮผโ‰ 80

B. Code

# t-test
t4 <- mosaic::t_test(
  exam_grades$course_grade, # Name of variable
  mu = 80, # belief
  alternative = "two.sided"
) %>% # Check both sides
  broom::tidy()
t4
ABCDEFGHIJ0123456789
estimate
<dbl>
statistic
<dbl>
p.value
<dbl>
parameter
<dbl>
conf.low
<dbl>
conf.high
<dbl>
method
<chr>
alternative
<chr>
72.23883-12.079992.187064e-2623270.9729973.50467One Sample t-testtwo.sided
1 row

So, we can reject the NULL Hypothesis that the average grade in the population of students who have taken this class is 80, since there is a minuscule chance that we would see an observed sample mean of 72.238, if the population mean ฮผ had really been 80.

# t-test
t5 <- wilcox.test(
  exam_grades$course_grade, # Name of variable
  mu = 90, # belief
  alternative = "two.sided",
  conf.int = TRUE,
  conf.level = 0.95
) %>% # Check both sides

  broom::tidy() # Make results presentable, and plottable!!
t5
ABCDEFGHIJ0123456789
estimate
<dbl>
statistic
<dbl>
p.value
<dbl>
conf.low
<dbl>
conf.high
<dbl>
72.42511751.487917e-3971.1500273.71426
1 row | 1-5 of 7 columns

This test too suggests that the average course grade is different from 80.

NoteWhy compare on both sides?

Note that we have computed whether the average course_grade is generally different from 80 for this Teacher. We could have computed whether it is greater, or lesser than 80 ( or any other number too). Read this article for why it is better to do a โ€œtwo.sidedโ€ test in most cases.

# Set graph theme
theme_set(new = theme_custom())
#
# Calculate exact mean
obs_mean_grade <- mean(~course_grade, data = exam_grades)
belief <- 80
obs_grade_diff <- belief - obs_mean_grade
## Steps in a Permutation Test
## Repeatedly Shuffle polarities of data observations
## Take means
## Compare all means with the real-world observed one
null_dist_grade <-
  mosaic::do(4999) *
    mean(
      ~ (course_grade - belief) *
        sample(c(-1, 1), # +/- 1s multiply y
          length(course_grade), # How many +/- 1s?
          replace = T
        ), # select with replacement
      data = exam_grades
    )

## Plot this NULL distribution
gf_histogram(
  ~mean,
  data = null_dist_grade,
  fill = ~ (mean >= obs_grade_diff),
  bins = 50,
  title = "Distribution of Permuted Difference-Means under Null Hypothesis",
  subtitle = "Why is the mean of the means zero??"
) %>%
  gf_labs(
    x = "Calculated Random Means",
    y = "How Often do these occur?"
  ) %>%
  gf_vline(xintercept = obs_grade_diff, colour = "red") %>%
  gf_vline(xintercept = -obs_grade_diff, colour = "red")

# p-value
# Permutation distributions are always centered around zero. Why?
prop(~ mean >= obs_grade_diff,
  data = null_dist_grade
) +
  prop(~ mean <= -obs_grade_diff,
    data = null_dist_grade
  )
prop_TRUE 
        0 

And let us now do the bootstrap test:

null_grade_bs <- mosaic::do(4999) *
  mean(
    ~ sample(course_grade,
      replace = T
    ), # select with replacement
    data = exam_grades
  )

## Plot this NULL distribution
gf_histogram(
  ~mean,
  data = null_grade_bs,
  fill = ~ (mean >= obs_grade_diff),
  bins = 50,
  title = "Distribution of Bootstrap Means"
) %>%
  gf_labs(
    x = "Calculated Random Means",
    y = "How Often do these occur?"
  ) %>%
  gf_vline(xintercept = ~belief, colour = "red")

prop(~ mean >= belief,
  data = null_grade_bs
) +
  prop(~ mean <= -belief,
    data = null_grade_bs
  )
prop_TRUE 
        0 

The permutation test shows that we are not able to โ€œgenerateโ€ the believed mean-difference with any of the permutations. Likewise with the bootstrap, we are not able to hit the believed mean with any of the bootstrap samples.

Hence there is no reason to believe that the belief (80) might be a reasonable one and we reject our NULL Hypothesis that the mean is equal to 80.

Workflow for Inference for a Single Mean

A series of tests deal with one mean value of a sample. The idea is to evaluate whether that mean is representative of the mean of the underlying population. Depending upon the nature of the (single) variable, the test that can be used are as follows:

Check Assumptions

Yes\n Parametric

No\n Non-Parametric

No\n Non-Parametric

No\n Non-Parametric

Inference for Single Mean

Normality: Shapiro-Wilk Test shapiro.test\n or\n Anderson-Darling Test

OK?

t.test

wilcox.test

t.test\n with\n Signed-Ranks of Data

Bootstrap

Permutation

Wait, But Why?

  • We can only sample from a population, and calculate sample statistics
  • But we still want to know about population parameters
  • All our tests and measures of uncertainty with samples are aimed at obtaining a confident measure of a population parameter.
  • Means are the first on the list!

Conclusion

  • If samples are normally distributed, we use a t.test.
  • Else we try non-parametric tests such as the Wilcoxon test.
  • Since we now have compute power at our fingertips, we can leave off considerations of normality and simply proceed with either a permutation or a boostrap test.

References

  1. OpenIntro Modern Statistics, Chapter #17

  2. Bootstrap based Inference using the infer package: https://infer.netlify.app/articles/t_test

  3. Michael Clark & Seth Berry. Models Demystified: A Practical Guide from t-tests to Deep Learning. https://m-clark.github.io/book-of-models/

  4. University of Warwickshire. SAMPLING: Searching for the Approximation Method use to Perform rational inference by Individuals and Groups. https://sampling.warwick.ac.uk/#Overview

Additional Readings

  1. https://mine-cetinkaya-rundel.github.io/quarto-tip-a-day/posts/21-diagrams/
R Package Citations
Package Version Citation
explore 1.3.3 Krasser (2024)
infer 1.0.7 Couch et al. (2021)
openintro 2.5.0 ร‡etinkaya-Rundel et al. (2024)
resampledata 0.3.2 Chihara and Hesterberg (2018)
TeachHist 0.2.1 Lange (2023)
TeachingDemos 2.13 Snow (2024)
ร‡etinkaya-Rundel, Mine, David Diez, Andrew Bray, Albert Y. Kim, Ben Baumer, Chester Ismay, Nick Paterno, and Christopher Barr. 2024. openintro: Datasets and Supplemental Functions from โ€œOpenIntroโ€ Textbooks and Labs. https://CRAN.R-project.org/package=openintro.
Chihara, Laura M., and Tim C. Hesterberg. 2018. Mathematical Statistics with Resampling and r. John Wiley & Sons Hoboken NJ. https://github.com/lchihara/MathStatsResamplingR?tab=readme-ov-file.
Couch, Simon P., Andrew P. Bray, Chester Ismay, Evgeni Chasnovski, Benjamin S. Baumer, and Mine ร‡etinkaya-Rundel. 2021. โ€œinfer: An R Package for Tidyverse-Friendly Statistical Inference.โ€ Journal of Open Source Software 6 (65): 3661. https://doi.org/10.21105/joss.03661.
Krasser, Roland. 2024. explore: Simplifies Exploratory Data Analysis. https://CRAN.R-project.org/package=explore.
Lange, Carsten. 2023. TeachHist: A Collection of Amended Histograms Designed for Teaching Statistics. https://CRAN.R-project.org/package=TeachHist.
Snow, Greg. 2024. TeachingDemos: Demonstrations for Teaching and Learning. https://CRAN.R-project.org/package=TeachingDemos.
Back to top

Footnotes

  1. https://allendowney.blogspot.com/2016/06/there-is-still-only-one-test.htmlโ†ฉ๏ธŽ

  2. https://stats.stackexchange.com/q/171748โ†ฉ๏ธŽ

  3. https://www.r-bloggers.com/2021/11/anderson-darling-test-in-r-quick-normality-check/โ†ฉ๏ธŽ

Citation

BibTeX citation:
@online{v.2022,
  author = {V., Arvind},
  title = {๐Ÿƒ {Inference} for a {Single} {Mean}},
  date = {2022-11-10},
  url = {https://av-quarto.netlify.app/content/courses/Analytics/Inference/Modules/100-OneMean/},
  langid = {en},
  abstract = {Inference Tests for a Single population Mean}
}
For attribution, please cite this work as:
V., Arvind. 2022. โ€œ๐Ÿƒ Inference for a Single Mean.โ€ November 10, 2022. https://av-quarto.netlify.app/content/courses/Analytics/Inference/Modules/100-OneMean/.
Basics of Randomization Tests
๐Ÿƒ Inference for Two Independent Means

License: CC BY-SA 2.0

Website made with โค๏ธ and Quarto, by Arvind V.

Hosted by Netlify .