library(tidyverse)
library(mosaic) # Our go-to package
library(ggformula)
library(infer) # An alternative package for inference using tidy data
library(broom) # Clean test results in tibble form
library(skimr) # data inspection
library(explore) # New, Easy package for Stats Test and Viz, and other things
library(resampledata3) # Datasets from Chihara and Hesterberg's book
library(openintro) # datasets
library(gt) # for tables
Inference for Two Independent Means
Setting up R Packages
Introduction
flowchart TD
A[Inference for Two Independent Means] -->|Check Assumptions| B[Normality: Shapiro-Wilk Test shapiro.test\n Variances: Fisher F-test var.test]
B --> C{OK?}
C -->|Yes, both\n Parametric| D[t.test]
D <-->F[Linear Model\n Method]
C -->|Yes, but not variance\n Parametric| W[t.test with\n Welch Correction]
W<-->F
C -->|No\n Non-Parametric| E[wilcox.test]
E <--> G[Linear Model\n with\n Signed-Ranks]
C -->|No\n Non-Parametric| P[Bootstrap\n or\n Permutation] P <--> Q[Linear Model\n with Signed-Rank\n with Permutation]
flowchart TD A[Inference for Two Independent Means] -->|Check Assumptions| B[Normality: Shapiro-Wilk Test shapiro.test\n Variances: Fisher F-test var.test] B --> C{OK?} C -->|Yes, both\n Parametric| D[t.test] D <-->F[Linear Model\n Method] C -->|Yes, but not variance\n Parametric| W[t.test with\n Welch Correction] W<-->F C -->|No\n Non-Parametric| E[wilcox.test] E <--> G[Linear Model\n with\n Signed-Ranks] C -->|No\n Non-Parametric| P[Bootstrap\n or\n Permutation] P <--> Q[Linear Model\n with Signed-Rank\n with Permutation]
Case Study #1: A Simple Data set with Two Quant Variables
Let us look at the MathAnxiety
dataset from the package resampledata
. Here we have โanxietyโ scores for boys and girls, for different components of mathematics.
Inspecting and Charting Data
data("MathAnxiety")
MathAnxiety
MathAnxiety_inspect <- inspect(MathAnxiety)
MathAnxiety_inspect$categorical
MathAnxiety_inspect$quantitative
We have ~600 data entries, and with 4 Quant variables; Age
,AMAS
, RCMAS
, and AMAS
; and two Qual variables, Gender
and Grade
. A simple dataset, with enough entries to make it worthwhile to explore as our first example.
Is there a difference between boy and girl โanxietyโ levels for AMAS (test) in the MathAnxiety dataset?
First, histograms, densities and counts of the variable we are interested in, after converting data into long format:
# Set graph theme
theme_set(new = theme_custom())
#
MathAnxiety %>%
gf_density(~ AMAS,
fill = ~ Gender,
alpha = 0.5,
title = "")
MathAnxiety %>%
gf_boxplot(AMAS ~ Gender,
fill = ~ Gender,
alpha = 0.5,
title = "")
MathAnxiety %>% count(Gender)
MathAnxiety %>% group_by(Gender) %>% summarise(mean = mean(AMAS))
The distributions for anxiety scores for boys and girls overlap considerably and are very similar, though the boxplot for boys shows a significant outlier. Are they close to being normal distributions too? We should check.
A. Check for Normality
Statistical tests for means usually require a couple of checks1 2:
- Are the data normally distributed?
- Are the data variances similar?
Let us complete a check for normality: the shapiro.wilk
test checks whether a Quant variable is from a normal distribution; the NULL hypothesis is that the data are from a normal distribution. We will also look at Q-Q plots for both variables:
# Set graph theme
theme_set(new = theme_custom())
#
MathAnxiety %>%
gf_density( ~ AMAS,
fill = ~ Gender,
alpha = 0.5,
title = "Math Anxiety scores for boys and girls") %>%
gf_facet_grid(~ Gender) %>%
gf_fitdistr(dist = "dnorm")
MathAnxiety%>%
gf_qq(~ AMAS, color = ~ Gender) %>%
gf_qqline(ylab = "Math Anxiety Score..are they Normally Disributed?") %>%
gf_facet_wrap(~ Gender) # independent y-axis
boys_AMAS <- MathAnxiety %>% filter(Gender== "Boy") %>% select(AMAS)
girls_AMAS <- MathAnxiety %>% filter(Gender== "Girl") %>% select(AMAS)
shapiro.test(boys_AMAS$AMAS)
shapiro.test(girls_AMAS$AMAS)
The distributions for anxiety scores for boys and girls are almost normal, visually speaking. With the Shapiro-Wilk test we find that the scores for girls
are normally distributed, but the boys
scores are not so.
The p.value
obtained in the shapiro.wilk
test suggests the chances of the data, given the Assumption that they are normally distributed.
We see that MathAnxiety contains discrete-level scores for anxiety for the two variables (for Boys and Girls) anxiety scores. The boys
score has a significant outlier, which we saw earlier and perhaps that makes that variable lose out, perhaps narrowly.
B. Check for Variances
Let us check if the two variables have similar variances: the var.test
does this for us, with a NULL hypothesis that the variances are not significantly different:
var.test(AMAS ~ Gender, data = MathAnxiety,
conf.int = TRUE,conf.level = 0.95) %>%
broom::tidy()
qf(0.975,275,322)
[1] 1.254823
The variances are quite similar as seen by the \(p.value = 0.82\). We also saw it visually when we plotted the overlapped distributions earlier.
- The two variables are not both normally distributed.
- The two variances are significantly similar.
Hypothesis
Based on the graphs, how would we formulate our Hypothesis? We wish to infer whether there is any difference in the mean anxiety score between Girls and Boys in the dataset MathAnxiety
. So accordingly:
\[ H_0: \mu_{Boys} = \mu_{Girls}\\ \] \[ H_a: \mu_{Girls} \ne \mu_{Boys}\\ \]
Observed and Test Statistic
What would be the test statistic we would use? The difference in means. Is the observed difference in the means between the two groups of scores non-zero? We use the diffmean
function:
obs_diff_amas <- diffmean(AMAS ~ Gender, data = MathAnxiety)
obs_diff_amas
diffmean
1.7676
Girlsโ AMAS
anxiety scores are, on average, \(1.76\) points higher than those for Boys.
Different tests here will show the difference as positive or negative, but with the same value! This depends upon the way the factor variable Gender
is used, i.e. Boy-Girl or Girl-Boyโฆ
Inference
Since the data are not both normally distributed, though the variances similar, we typically cannot use a parametric t.test
. However, we can still examine the results:
The p.value
is \(0.001\) ! And the Confidence Interval
does not straddle \(0\). So the t.test
gives us good reason to reject the Null Hypothesis that the means are similar and that there is a significant difference between Boys and Girls when it comes to AMAS
anxiety. But can we really believe this, given the non-normality of data?
Since the data variables do not satisfy the assumption of being normally distributed, and though the variances are similar, we use the classical wilcox.test
Type help(wilcox.test)
in your Console., which implements what we need here: the Mann-Whitney U test:3
The Mann-Whitney test as a test of mean ranks. It first ranks all your values from high to low, computes the mean rank in each group, and then computes the probability that random shuffling of those values between two groups would end up with the mean ranks as far apart as, or further apart, than you observed. No assumptions about distributions are needed so far. (emphasis mine)
\[ mean(rank(AMAS_{Girls})) - mean(rank(AMAS_{Boys})) = \beta_1;\\ \] \[ H_0: \beta_1 = 0;\\ \\ \] \[ H_a: \beta_1 \ne 0 \]
wilcox.test(AMAS ~ Gender, data = MathAnxiety,
conf.int = TRUE,
conf.level = 0.95) %>%
broom::tidy()
The p.value
is very similar, \(0.00077\), and again the Confidence Interval
does not straddle \(0\), and we are hence able to reject the NULL hypothesis that the means are equal and accept the alternative hypothesis that there is a significant difference in mean anxiety scores between Boys and Girls.
We can apply the linear-model-as-inference interpretation to the ranked data data to implement the non-parametric test as a Linear Model:
\[ lm(rank(AMAS) \sim gender) = \beta_0 + \beta_1 * gender \] \[ H_0: \beta_1 = 0\\ \] \[ H_a: \beta_1 \ne 0\\ \]
lm
Note how the Qual variable was used here in Linear Regression! The Gender
variable was treated as a binary โdummyโ variable4.
Here too we see that the p.value
for the slope term (โGenderGirlโ) is significant at \(7.4*10^{-4}\).
We saw from the diagram created by Allen Downey that there is only one test5! We will now use this philosophy to develop a technique that allows us to mechanize several Statistical Models in that way, with nearly identical code.
We pretend that Gender
has not effect on the AMAS
anxiety scores. If this is our position, then the Gender
labels are essentially meaningless, and we can pretend that any AMAS
score belongs to a Boy or a Girl. This means we can mosaic::shuffle
(permute) the Gender
labels and see how uncommon our real data is. And we do not have to really worry about whether the data are normally distributed, or if their variances are nearly equal.
The โpretendโ position is exactly the NULL Hypothesis!! The โuncommonโ part is the p.value
under NULL!!
# Set graph theme
theme_set(new = theme_custom())
#
null_dist_amas <-
do(9999) * diffmean(data = MathAnxiety, AMAS ~ shuffle(Gender))
null_dist_amas
###
gf_histogram(data = null_dist_amas, ~ diffmean, bins = 25) %>%
gf_vline(xintercept = obs_diff_amas, colour = "red")
###
gf_ecdf(data = null_dist_amas, ~ diffmean) %>%
gf_vline(xintercept = obs_diff_amas, colour = "red")
###
1-prop1(~ diffmean <= obs_diff_amas, data = null_dist_amas)
Clearly the observed_diff_amas
is much beyond anything we can generate with permutations with gender
! And hence there is a significant difference in weights across gender
!
All Tests Together
We can put all the test results together to get a few more insights about the tests:
mosaic::t_test(AMAS ~ Gender,
data = MathAnxiety) %>%
broom::tidy() %>%
gt() %>%
tab_style(
style = list(cell_fill(color = "cyan"), cell_text(weight = "bold")),
locations = cells_body(columns = p.value)) %>%
tab_header(title = "t.test")
wilcox.test(AMAS ~ Gender,
data = MathAnxiety) %>%
broom::tidy() %>%
gt() %>%
tab_style(
style = list(cell_fill(color = "cyan"), cell_text(weight = "bold")),
locations = cells_body(columns = p.value)) %>%
tab_header(title = "wilcox.test")
lm(AMAS ~ Gender,
data = MathAnxiety) %>%
broom::tidy(conf.int = TRUE,
conf.level = 0.95) %>%
gt() %>%
tab_style(
style = list(cell_fill(color = "cyan"),cell_text(weight = "bold")),
locations = cells_body(columns = p.value)) %>%
tab_header(title = "Linear Model with Original Data")
lm(rank(AMAS) ~ Gender,
data = MathAnxiety) %>%
broom::tidy(conf.int = TRUE,
conf.level = 0.95) %>%
gt() %>%
tab_style(
style = list(cell_fill(color = "cyan"),cell_text(weight = "bold")),
locations = cells_body(columns = p.value)) %>%
tab_header(title = "Linear Model with Ranked Data")
t.test | |||||||||
---|---|---|---|---|---|---|---|---|---|
estimate | estimate1 | estimate2 | statistic | p.value | parameter | conf.low | conf.high | method | alternative |
-1.7676 | 21.16718 | 22.93478 | -3.291843 | 0.001055808 | 580.2004 | -2.822229 | -0.7129706 | Welch Two Sample t-test | two.sided |
wilcox.test | |||
---|---|---|---|
statistic | p.value | method | alternative |
37483 | 0.0007736219 | Wilcoxon rank sum test with continuity correction | two.sided |
Linear Model with Original Data | ||||||
---|---|---|---|---|---|---|
term | estimate | std.error | statistic | p.value | conf.low | conf.high |
(Intercept) | 21.16718 | 0.3641315 | 58.130602 | 5.459145e-248 | 20.4520482 | 21.882317 |
GenderGirl | 1.76760 | 0.5364350 | 3.295087 | 1.042201e-03 | 0.7140708 | 2.821129 |
Linear Model with Ranked Data | ||||||
---|---|---|---|---|---|---|
term | estimate | std.error | statistic | p.value | conf.low | conf.high |
(Intercept) | 278.04644 | 9.535561 | 29.158898 | 6.848992e-117 | 259.31912 | 296.7738 |
GenderGirl | 47.64559 | 14.047696 | 3.391701 | 7.405210e-04 | 20.05668 | 75.2345 |
Footnotes
https://stats.stackexchange.com/questions/2492/is-normality-testing-essentially-uselessโฉ๏ธ
https://www.allendowney.com/blog/2023/01/28/never-test-for-normality/โฉ๏ธ
https://www.middleprofessor.com/files/applied-biostatistics_bookdown/_book/intro-linear-models.html#a-linear-model-can-be-fit-to-data-with-continuous-discrete-or-categorical-x-variablesโฉ๏ธ
https://allendowney.blogspot.com/2016/06/there-is-still-only-one-test.htmlโฉ๏ธ
https://stats.stackexchange.com/questions/92374/testing-large-dataset-for-normality-how-and-is-it-reliableโฉ๏ธ
https://en.wikipedia.org/wiki/Dummy_variable_(statistics)โฉ๏ธ