library(tidyverse)
library(mosaic) # Our go-to package
library(ggformula)
library(infer) # An alternative package for inference using tidy data
library(broom) # Clean test results in tibble form
library(skimr) # data inspection
library(resampledata3) # Datasets from Chihara and Hesterberg's book
library(openintro) # datasets
library(gt) # for tables
Inference for Two Independent Means
Setting up R Packages
Introduction
flowchart TD A[Inference for Two Independent Means] -->|Check Assumptions| B[Normality: Shapiro-Wilk Test shapiro.test\n Variances: Fisher F-test var.test] B --> C{OK?} C -->|Yes, both\n Parametric| D[t.test] D <-->F[Linear Model\n Method] C -->|Yes, but not variance\n Parametric| W[t.test with\n Welch Correction] W<-->F C -->|No\n Non-Parametric| E[wilcox.test] E <--> G[Linear Model\n with\n Signed-Ranks] C -->|No\n Non-Parametric| P[Bootstrap\n or\n Permutation] P <--> Q[Linear Model\n with Signed-Rank\n with Permutation]
Case Study #1: A Simple Data set with Two Quant Variables
Let us look at the MathAnxiety
dataset from the package resampledata
. Here we have βanxietyβ scores for boys and girls, for different components of mathematics.
Inspecting and Charting Data
data("MathAnxiety")
MathAnxiety
MathAnxiety_inspect <- inspect(MathAnxiety)
MathAnxiety_inspect$categorical
MathAnxiety_inspect$quantitative
We have ~600 data entries, and with 4 Quant variables; Age
,AMAS
, RCMAS
, and AMAS
; and two Qual variables, Gender
and Grade
. A simple dataset, with enough entries to make it worthwhile to explore as our first example.
Is there a difference between boy and girl βanxietyβ levels for AMAS (test) in the MathAnxiety dataset?
First, histograms, densities and counts of the variable we are interested in, after converting data into long format:
# Set graph theme
theme_set(new = theme_custom())
#
MathAnxiety %>%
gf_density(~ AMAS,
fill = ~ Gender,
alpha = 0.5,
title = "")
MathAnxiety %>%
gf_boxplot(AMAS ~ Gender,
fill = ~ Gender,
alpha = 0.5,
title = "")
MathAnxiety %>% count(Gender)
MathAnxiety %>% group_by(Gender) %>% summarise(mean = mean(AMAS))
The distributions for anxiety scores for boys and girls overlap considerably and are very similar, though the boxplot for boys shows a significant outlier. Are they close to being normal distributions too? We should check.
A. Check for Normality
Statistical tests for means usually require a couple of checks1 2:
- Are the data normally distributed?
- Are the data variances similar?
Let us complete a check for normality: the shapiro.wilk
test checks whether a Quant variable is from a normal distribution; the NULL hypothesis is that the data are from a normal distribution. We will also look at Q-Q plots for both variables:
# Set graph theme
theme_set(new = theme_custom())
#
MathAnxiety %>%
gf_density( ~ AMAS,
fill = ~ Gender,
alpha = 0.5,
title = "Math Anxiety scores for boys and girls") %>%
gf_facet_grid(~ Gender) %>%
gf_fitdistr(dist = "dnorm")
MathAnxiety%>%
gf_qq(~ AMAS, color = ~ Gender) %>%
gf_qqline(ylab = "Math Anxiety Score..are they Normally Disributed?") %>%
gf_facet_wrap(~ Gender) # independent y-axis
boys_AMAS <- MathAnxiety %>% filter(Gender== "Boy") %>% select(AMAS)
girls_AMAS <- MathAnxiety %>% filter(Gender== "Girl") %>% select(AMAS)
shapiro.test(boys_AMAS$AMAS)
shapiro.test(girls_AMAS$AMAS)
The distributions for anxiety scores for boys and girls are almost normal, visually speaking. With the Shapiro-Wilk test we find that the scores for girls
are normally distributed, but the boys
scores are not so.
The p.value
obtained in the shapiro.wilk
test suggests the chances of the data, given the Assumption that they are normally distributed.
We see that MathAnxiety contains discrete-level scores for anxiety for the two variables (for Boys and Girls) anxiety scores. The boys
score has a significant outlier, which we saw earlier and perhaps that makes that variable lose out, perhaps narrowly.
B. Check for Variances
Let us check if the two variables have similar variances: the var.test
does this for us, with a NULL hypothesis that the variances are not significantly different:
var.test(AMAS ~ Gender, data = MathAnxiety,
conf.int = TRUE,conf.level = 0.95) %>%
broom::tidy()
qf(0.975,275,322)
[1] 1.254823
The variances are quite similar as seen by the \(p.value = 0.82\). We also saw it visually when we plotted the overlapped distributions earlier.
- The two variables are not both normally distributed.
- The two variances are significantly similar.
Hypothesis
Based on the graphs, how would we formulate our Hypothesis? We wish to infer whether there is any difference in the mean anxiety score between Girls and Boys in the dataset MathAnxiety
. So accordingly:
\[ H_0: \mu_{Boys} = \mu_{Girls}\\ \]
\[ H_a: \mu_{Girls} \ne \mu_{Boys}\\ \]
Observed and Test Statistic
What would be the test statistic we would use? The difference in means. Is the observed difference in the means between the two groups of scores non-zero? We use the diffmean
function:
obs_diff_amas <- diffmean(AMAS ~ Gender, data = MathAnxiety)
obs_diff_amas
diffmean
1.7676
Girlsβ AMAS
anxiety scores are, on average, \(1.76\) points higher than those for Boys.
Different tests here will show the difference as positive or negative, but with the same value! This depends upon the way the factor variable Gender
is used, i.e. Boy-Girl or Girl-Boyβ¦
Inference
help(wilcox.test)
in your Console.Since the data are not both normally distributed, though the variances similar, we typically cannot use a parametric t.test
. However, we can still examine the results:
The p.value
is \(0.001\) ! And the Confidence Interval
does not straddle \(0\). So the t.test
gives us good reason to reject the Null Hypothesis that the means are similar and that there is a significant difference between Boys and Girls when it comes to AMAS
anxiety. But can we really believe this, given the non-normality of data?
Since the data variables do not satisfy the assumption of being normally distributed, and though the variances are similar, we use the classical wilcox.test
, which implements what we need here: the Mann-Whitney U test:3
The Mann-Whitney test as a test of mean ranks. It first ranks all your values from high to low, computes the mean rank in each group, and then computes the probability that random shuffling of those values between two groups would end up with the mean ranks as far apart as, or further apart, than you observed. No assumptions about distributions are needed so far. (emphasis mine)
\[ mean(rank(AMAS_{Girls})) - mean(rank(AMAS_{Boys})) = \beta_1;\\ \]
\[ H_0: \beta_1 = 0;\\ \\ \]
\[ H_a: \beta_1 \ne 0 \]
wilcox.test(AMAS ~ Gender, data = MathAnxiety,
conf.int = TRUE,
conf.level = 0.95) %>%
broom::tidy()
The p.value
is very similar, \(0.00077\), and again the Confidence Interval
does not straddle \(0\), and we are hence able to reject the NULL hypothesis that the means are equal and accept the alternative hypothesis that there is a significant difference in mean anxiety scores between Boys and Girls.
We can apply the linear-model-as-inference interpretation to the ranked data data to implement the non-parametric test as a Linear Model:
\[ lm(rank(AMAS) \sim gender) = \beta_0 + \beta_1 * gender \]
\[ H_0: \beta_1 = 0\\ \]
\[ H_a: \beta_1 \ne 0\\ \]
lm
Note how the Qual variable was used here in Linear Regression! The Gender
variable was treated as a binary βdummyβ variable4.
Here too we see that the p.value
for the slope term (βGenderGirlβ) is significant at \(7.4*10^{-4}\).
We saw from the diagram created by Allen Downey that there is only one test5! We will now use this philosophy to develop a technique that allows us to mechanize several Statistical Models in that way, with nearly identical code.
We pretend that Gender
has not effect on the AMAS
anxiety scores. If this is our position, then the Gender
labels are essentially meaningless, and we can pretend that any AMAS
score belongs to a Boy or a Girl. This means we can mosaic::shuffle
(permute) the Gender
labels and see how uncommon our real data is. And we do not have to really worry about whether the data are normally distributed, or if their variances are nearly equal.
The βpretendβ position is exactly the NULL Hypothesis!! The βuncommonβ part is the p.value
under NULL!!
# Set graph theme
theme_set(new = theme_custom())
#
null_dist_amas <-
do(9999) * diffmean(data = MathAnxiety, AMAS ~ shuffle(Gender))
null_dist_amas
###
gf_histogram(data = null_dist_amas, ~ diffmean, bins = 25) %>%
gf_vline(xintercept = obs_diff_amas, colour = "red")
###
gf_ecdf(data = null_dist_amas, ~ diffmean) %>%
gf_vline(xintercept = obs_diff_amas, colour = "red")
###
1-prop1(~ diffmean <= obs_diff_amas, data = null_dist_amas)
Clearly the observed_diff_amas
is much beyond anything we can generate with permutations with gender
! And hence there is a significant difference in weights across gender
!
All Tests Together
We can put all the test results together to get a few more insights about the tests:
mosaic::t_test(AMAS ~ Gender,
data = MathAnxiety) %>%
broom::tidy() %>%
gt() %>%
tab_style(
style = list(cell_fill(color = "cyan"), cell_text(weight = "bold")),
locations = cells_body(columns = p.value)) %>%
tab_header(title = "t.test")
wilcox.test(AMAS ~ Gender,
data = MathAnxiety) %>%
broom::tidy() %>%
gt() %>%
tab_style(
style = list(cell_fill(color = "cyan"), cell_text(weight = "bold")),
locations = cells_body(columns = p.value)) %>%
tab_header(title = "wilcox.test")
lm(AMAS ~ Gender,
data = MathAnxiety) %>%
broom::tidy(conf.int = TRUE,
conf.level = 0.95) %>%
gt() %>%
tab_style(
style = list(cell_fill(color = "cyan"),cell_text(weight = "bold")),
locations = cells_body(columns = p.value)) %>%
tab_header(title = "Linear Model with Original Data")
lm(rank(AMAS) ~ Gender,
data = MathAnxiety) %>%
broom::tidy(conf.int = TRUE,
conf.level = 0.95) %>%
gt() %>%
tab_style(
style = list(cell_fill(color = "cyan"),cell_text(weight = "bold")),
locations = cells_body(columns = p.value)) %>%
tab_header(title = "Linear Model with Ranked Data")
t.test | |||||||||
---|---|---|---|---|---|---|---|---|---|
estimate | estimate1 | estimate2 | statistic | p.value | parameter | conf.low | conf.high | method | alternative |
-1.7676 | 21.16718 | 22.93478 | -3.291843 | 0.001055808 | 580.2004 | -2.822229 | -0.7129706 | Welch Two Sample t-test | two.sided |
wilcox.test | |||
---|---|---|---|
statistic | p.value | method | alternative |
37483 | 0.0007736219 | Wilcoxon rank sum test with continuity correction | two.sided |
Linear Model with Original Data | ||||||
---|---|---|---|---|---|---|
term | estimate | std.error | statistic | p.value | conf.low | conf.high |
(Intercept) | 21.16718 | 0.3641315 | 58.130602 | 5.459145e-248 | 20.4520482 | 21.882317 |
GenderGirl | 1.76760 | 0.5364350 | 3.295087 | 1.042201e-03 | 0.7140708 | 2.821129 |
Linear Model with Ranked Data | ||||||
---|---|---|---|---|---|---|
term | estimate | std.error | statistic | p.value | conf.low | conf.high |
(Intercept) | 278.04644 | 9.535561 | 29.158898 | 6.848992e-117 | 259.31912 | 296.7738 |
GenderGirl | 47.64559 | 14.047696 | 3.391701 | 7.405210e-04 | 20.05668 | 75.2345 |
As we can see, all tests are in agreement that there is a significant effect of Gender
on the AMAS
anxiety scores!
Case Study #2: Youth Risk Behavior Surveillance System (YRBSS) survey
Every two years, the Centers for Disease Control and Prevention in the USA conduct the Youth Risk Behavior Surveillance System (YRBSS) survey, where it takes data from highschoolers (9th through 12th grade), to analyze health patterns. We will work with a selected group of variables from a random sample of observations during one of the years the YRBSS was conducted.
help(yrbss)
Inspecting and Charting Data
data(yrbss)
yrbss
yrbss_inspect <- inspect(yrbss)
yrbss_inspect$categorical
yrbss_inspect$quantitative
We have 13K data entries, and with 13 different variables, some Qual and some Quant. Many entries are missing too, typical of real-world data and something we will have to account for in our computations. The meaning of each variable can be found by bringing up the help file.
help(yrbss)
In this Case Study, our research question is:
Does weight
of highschoolers in this dataset vary with gender
?
Inspecting and Charting Data
First, histograms, densities and counts of the variable we are interested in:
# Set graph theme
theme_set(new = theme_custom())
#
yrbss_select_gender <- yrbss %>%
select(weight, gender) %>%
drop_na(weight) # Sadly dropping off NA data
###
yrbss_select_gender %>%
gf_density( ~ weight,
fill = ~ gender,
alpha = 0.5,
title = "Highschoolers' Weights by Gender")
###
yrbss_select_gender %>%
gf_boxplot(weight ~ gender,
fill = ~ gender,
alpha = 0.5,
title = "Highschoolers' Weights by Gender")
###
yrbss_select_gender %>% count(gender)
Overlapped Distribution plot shows some difference in the means; and the Boxplots show visible difference in the medians.
A. Check for Normality
As stated before, statistical tests for means usually require a couple of checks:
- Are the data normally distributed?
- Are the data variances similar?
Let us also complete a visual check for normality, with plots since we cannot do a shapiro.test
:
The longest data it can take (in R) is 5000. Since our data is longer, we will cannot use this procedure and have to resort to visual means.
Let us plot frequency distribution and Q-Q plots6 for both variables.
# Set graph theme
theme_set(new = theme_custom())
#
male_student_weights <- yrbss_select_gender %>%
filter(gender == "male") %>%
select(weight)
female_student_weights <- yrbss_select_gender %>%
filter(gender == "female") %>%
select(weight)
#shapiro.test(male_student_weights$weight)
#shapiro.test(female_student_weights$weight)
yrbss_select_gender %>%
gf_density( ~ weight,
fill = ~ gender,
alpha = 0.5,
title = "Highschoolers' Weights by Gender") %>%
gf_facet_grid(~ gender) %>%
gf_fitdistr(dist = "dnorm")
yrbss_select_gender %>%
gf_qq(~ weight | gender) %>%
gf_qqline(ylab = "scores")
Distributions are not too close to normalβ¦perhaps a hint of a rightward skew, suggesting that there are some obese students.
No real evidence (visually) of the variables being normally distributed.
B. Check for Variances
Let us check if the two variables have similar variances: the var.test
does this for us, with a NULL hypothesis that the variances are not significantly different:
var.test( weight ~ gender, data = yrbss_select_gender,
conf.int = TRUE,
conf.level = 0.95) %>%
broom::tidy()
#qf(0.975,6164, 6413)
The p.value
being so small, we are able to reject the NULL Hypothesis that the variances of weight
are nearly equal across the two exercise regimes.
- The two variables are not normally distributed.
- The two variances are also significantly different.
This means that the parametric t.test
must be eschewed in favour of the non-parametric wilcox.test
. We will use that, and also attempt linear models
with rank data, and a final permutation test.
Hypothesis
Based on the graphs, how would we formulate our Hypothesis? We wish to infer whether there is difference in mean weight
across gender
. So accordingly:
\[ H_0: \mu_{male} = \mu_{female}\\ \\\ H_a: \mu_{male} \ne \mu_{female}\ \]
Observed and Test Statistic
What would be the test statistic we would use? The difference in means. Is the observed difference in the means between the two groups of scores non-zero? We use the diffmean
function, from mosaic
:
obs_diff_gender <- diffmean(weight ~ gender, data = yrbss_select_gender)
obs_diff_gender
diffmean
11.70089
Inference
help(wilcox.test)
in your Console.Since the data variables do not satisfy the assumption of being normally distributed, and the variances are significantly different, we use the classical wilcox.test
, which implements what we need here: the Mann-Whitney U test:7
The Mann-Whitney test as a test of mean ranks. It first ranks all your values from high to low, computes the mean rank in each group, and then computes the probability that random shuffling of those values between two groups would end up with the mean ranks as far apart as, or further apart, than you observed. No assumptions about distributions are needed so far. (emphasis mine)
Our model would be:
\[ mean(rank(Weight_{male})) - mean(rank(Weight_{female})) = \beta_1;\\ H_0: \beta_1 = 0;\\ \\ H_a: \beta_1 \ne 0 \]
Recall the earlier graph showing ranks of scores against Gender.
wilcox.test(weight ~ gender, data = yrbss_select_gender,
conf.int = TRUE,
conf.level = 0.95) %>%
broom::tidy()
The p.value
is negligible and we are able to reject the NULL hypothesis that the means are equal.
We can apply the linear-model-as-inference interpretation to the ranked data data to implement the non-parametric test as a Linear Model:
\[ lm(rank(weight) \sim gender) = \beta_0 + \beta_1 * gender \\ H_0: \beta_1 = 0\\ \\\ H_a: \beta_1 \ne 0\\ \]
# Create a sign-rank function
#signed_rank <- function(x) {sign(x) * rank(abs(x))}
lm(rank(weight) ~ gender,
data = yrbss_select_gender) %>%
broom::tidy(conf.int = TRUE,
conf.level = 0.95)
lm
Note how the Qual variable was used here in Linear Regression! The gender
variable was treated as a binary βdummyβ variable8.
For the specific data at hand, we need to shuffle the records between Semifinal
and Final
on a per Swimmer basis and take the test statistic
(difference between the two swim records for each swimmer). Another way to look at this is to take the differences between Semifinal
and Final
scores and shuffle the differences to either polarity. We will follow this method in the code below:
# Set graph theme
theme_set(new = theme_custom())
#
null_dist_weight <-
do(9999) * diffmean(data = yrbss_select_gender, weight ~ shuffle(gender))
null_dist_weight
###
gf_histogram(data = null_dist_weight, ~ diffmean, bins = 25) %>%
gf_vline(xintercept = obs_diff_gender, colour = "red")
###
gf_ecdf(data = null_dist_weight, ~ diffmean) %>%
gf_vline(xintercept = obs_diff_gender, colour = "red")
###
prop1(~ diffmean <= obs_diff_gender, data = null_dist_weight)
Clearly the observed_diff_weight
is much beyond anything we can generate with permutations with gender
! And hence there is a significant difference in weights across gender
!
All Tests Together
We can put all the test results together to get a few more insights about the tests:
wilcox.test(weight ~ gender, data = yrbss_select_gender,
conf.int = TRUE,
conf.level = 0.95) %>%
broom::tidy() %>%
gt() %>%
tab_style(
style = list(cell_fill(color = "cyan"), cell_text(weight = "bold")),
locations = cells_body(columns = p.value)) %>%
tab_header(title = "wilcox.test")
lm(rank(weight) ~ gender,
data = yrbss_select_gender) %>%
broom::tidy(conf.int = TRUE,
conf.level = 0.95) %>%
gt() %>%
tab_style(
style = list(cell_fill(color = "cyan"),cell_text(weight = "bold")),
locations = cells_body(columns = p.value)) %>%
tab_header(title = "Linear Model with Ranked Data")
wilcox.test | ||||||
---|---|---|---|---|---|---|
estimate | statistic | p.value | conf.low | conf.high | method | alternative |
-11.33999 | 10808212 | 0 | -11.34003 | -10.87994 | Wilcoxon rank sum test with continuity correction | two.sided |
Linear Model with Ranked Data | ||||||
---|---|---|---|---|---|---|
term | estimate | std.error | statistic | p.value | conf.low | conf.high |
(Intercept) | 4836.157 | 42.52745 | 113.71848 | 0 | 4752.797 | 4919.517 |
gendermale | 2851.246 | 59.55633 | 47.87478 | 0 | 2734.507 | 2967.986 |
The wilcox.test
and the linear model with rank data offer the same results. This is of course not surprising!
Case Study #3: Weight vs Exercise in the YRBSS Survey
Finally, consider the possible relationship between a highschoolerβs weight and their physical activity.
First, letβs create a new variable physical_3plus
, which will be coded as either βyesβ if the student is physically active for at least 3 days a week, and βnoβ if not. Recall that we have several missing data in that column, so we will (sadly) drop these before generating the new variable:
yrbss_select_phy <- yrbss %>%
drop_na(physically_active_7d, weight) %>%
mutate(physical_3plus = if_else(physically_active_7d >= 3, "yes", "no"),
physical_3plus = factor(physical_3plus,
labels = c("yes", "no"),
levels = c("yes", "no"))) %>%
select(weight,physical_3plus)
# Let us check
yrbss_select_phy%>% count(physical_3plus)
Does weight
vary based on whether students exercise on more or less than 3 days a week? (physically_active_7d
>= 3 days)
Inspecting and Charting Data
We can make distribution plots for weight
by physical_3plus
:
# Set graph theme
theme_set(new = theme_custom())
#
gf_boxplot(weight ~ physical_3plus,
fill = ~ physical_3plus,
data = yrbss_select_phy, xlab = "Days of Exercise >=3 ")
###
gf_density(~ weight,
fill = ~ physical_3plus,
data = yrbss_select_phy)
The box plots show how the medians of the two distributions compare, but we can also compare the means of the distributions using the following to first group the data by the physical_3plus
variable, and then calculate the mean weight
in these groups using the mean
function while ignoring missing values by setting the na.rm
argument to TRUE
.
yrbss_select_phy %>%
group_by(physical_3plus) %>%
summarise(mean_weight = mean(weight, na.rm = TRUE))
There is an observed difference, but is this difference large enough to deem it βstatistically significantβ? In order to answer this question we will conduct a hypothesis test. But before that a few more checks on the data:
A. Check for Normality
As stated before, statistical tests for means usually require a couple of checks:
- Are the data normally distributed?
- Are the data variances similar?
Let us also complete a visual check for normality,with plots since we cannot do a shapiro.test
:
yrbss_select_phy %>%
gf_density( ~ weight,
fill = ~ physical_3plus,
alpha = 0.5,
title = "Highschoolers' Weights by Exercise Frequency") %>%
gf_facet_grid(~ physical_3plus) %>%
gf_fitdistr(dist = "dnorm")
##
yrbss_select_phy %>%
gf_qq(~ weight | physical_3plus , color = ~ physical_3plus) %>%
gf_qqline(ylab = "Weight")
Again, not normally distributedβ¦
B. Check for Variances
Let us check if the two variables have similar variances: the var.test
does this for us, with a NULL hypothesis that the variances are not significantly different:
var.test( weight ~ physical_3plus, data = yrbss_select_phy,
conf.int = TRUE,
conf.level = 0.95) %>%
broom::tidy()
# Critical F value
qf(0.975,4021, 8341)
[1] 1.054398
The p.value
states the probability of the data being what it is, assuming the NULL hypothesis that variances were similar. It being so small, we are able to reject this NULL Hypothesis that the variances of weight
are nearly equal across the two exercise frequencies. (Compare the statistic
in the var.test
with the critical F-value)
- The two variables are not normally distributed.
- The two variances are also significantly different.
Hence we will have to use non-parametric tests to infer if the means are similar.
Hypothesis
Based on the graphs, how would we formulate our Hypothesis? We wish to infer whether there is difference in mean weight
across physical_3plus
. So accordingly:
\[ H_0: \mu_{physical-3plus-Yes} = \mu_{physical-3plus-No}\\ \\\ H_a: \mu_{physical-3plus-Yes} \ne \mu_{physical-3plus-No}\\ \]
Observed and Test Statistic
What would be the test statistic we would use? The difference in means. Is the observed difference in the means between the two groups of scores non-zero? We use the diffmean
function, from mosaic
:
obs_diff_phy <- diffmean(weight ~ physical_3plus, data = yrbss_select_phy)
obs_diff_phy
diffmean
-1.774584
Inference
Well, the variables are not normally distributed, and the variances are significantly different so a standard t.test
is not advised. We can still try:
mosaic::t_test(weight ~ physical_3plus,
var.equal = FALSE, # Welch Correction
data = yrbss_select_phy) %>%
broom::tidy()
The p.value
is \(8.9e-08\) ! And the Confidence Interval
is clear of \(0\). So the t.test
gives us good reason to reject the Null Hypothesis that the means are similar. But can we really believe this, given the non-normality of data?
However, we have seen that the data variables are not normally distributed. So a Wilcoxon Test, using signed-ranks, is indicated: (recall the model!)
# For stability reasons, it may be advisable to use rounded data or to set digits.rank = 7, say,
# such that determination of ties does not depend on very small numeric differences (see the example).
wilcox.test(weight ~ physical_3plus,
conf.int = TRUE,
conf.level = 0.95,
data = yrbss_select_phy) %>%
broom::tidy()
The nonparametric wilcox.test
also suggests that the means for weight
across physical_3plus
are significantly different.
Using the Linear Model Interpretation
We can apply the linear-model-as-inference interpretation to the ranked data data to implement the non-parametric test as a Linear Model:
\[ lm(rank(weight) \sim physical.3plus) = \beta_0 + \beta_1 \times physical.3plus \\ H_0: \beta_1 = 0\\ \\\ H_a: \beta_1 \ne 0\\ \]
lm(rank(weight) ~ physical_3plus,
data = yrbss_select_phy) %>%
broom::tidy(conf.int = TRUE,
conf.level = 0.95)
Here too, the linear model using rank
data arrives at a conclusion similar to that of the Mann-Whitney U test.
Using Permutation Tests
For this last Case Study, we will do this in two ways, just for fun: one using our familiar mosaic
package, and the other using the package infer
.
But first, we need to initialize the test, which we will save as obs_diff_**
.
obs_diff_infer <- yrbss_select_phy %>%
infer::specify(weight ~ physical_3plus) %>%
infer::calculate(stat = "diff in means", order = c("yes", "no"))
obs_diff_infer
obs_diff_mosaic <- mosaic::diffmean(~ weight | physical_3plus, data = yrbss_select_phy)
obs_diff_mosaic
obs_diff_phy
diffmean
-1.774584
diffmean
-1.774584
Note that obs_diff_infer
is a 1 X 1 dataframe; obs_diff_mosaic
is a scalar!!
Next, we will work through creating a permutation distribution using tools from the infer package.
In infer
, the specify()
function is used to specify the variables you are considering (notated y ~ x
), and you can use the calculate()
function to specify the stat
istic you want to calculate and the order
of subtraction you want to use. For this hypothesis, the statistic you are searching for is the difference in means, with the order being yes - no
.
After you have calculated your observed statistic, you need to create a permutation distribution. This is the distribution that is created by shuffling the observed weights into new physical_3plus
groups, labeled βyesβ and βnoβ.
We will save the permutation distribution as null_dist
.
null_dist <- yrbss_select_phy %>%
specify(weight ~ physical_3plus) %>%
hypothesize(null = "independence") %>%
generate(reps = 999, type = "permute") %>%
calculate(stat = "diff in means", order = c("yes", "no"))
null_dist
The hypothesize()
function is used to declare what the null hypothesis is. Here, we are assuming that studentβs weight is independent of whether they exercise at least 3 days or not.
We should also note that the type
argument within generate()
is set to "permute"
. This ensures that the statistics calculated by the calculate()
function come from a reshuffling of the data (not a resampling of the data)! Finally, the specify()
and calculate()
steps should look familiar, since they are the same as what we used to find the observed difference in means!
We can visualize this null distribution with the following code:
# Set graph theme
theme_set(new = theme_custom())
#
gf_histogram(data = null_dist, ~ stat) %>%
gf_vline(xintercept = ~ obs_diff_infer$stat, color = "red")
Now that you have calculated the observed statistic and generated a permutation distribution, you can calculate the p-value for your hypothesis test using the function get_p_value()
from the infer package.
null_dist %>%
get_p_value(obs_stat = obs_diff_infer, direction = "two_sided")
What warning message do you get? Why do you think you get this warning message? Let us construct and record a confidence interval for the difference between the weights of those who exercise at least three times a week and those who donβt, and interpret this interval in context of the data.
null_dist %>%
infer::get_confidence_interval(point_estimate = obs_diff_infer, level = 0.95)
We already have the observed difference, obs_diff_mosaic
. Now we generate the null distribution using permutation, with mosaic
:
We can also generate the histogram of the null distribution, compare that with the obs
erved diff
rence and compute the p-value
and confidence intervals
:
# Set graph theme
theme_set(new = theme_custom())
#
gf_histogram(~ diffmean, data = null_dist_mosaic) %>%
gf_vline(xintercept = obs_diff_mosaic, colour = "red")
# p-value
prop(~ diffmean != obs_diff_mosaic, data = null_dist_mosaic)
prop_TRUE
1
# Confidence Intervals for p = 0.95
mosaic::cdata(~ diffmean, p = 0.95, data = null_dist_mosaic)
Your Turn
- Try the
SwimRecords
dataset from themosaicData
package.
Conclusion
We have learnt how to perform inference for independent means.
- We have looked at the conditions that make the regular t.test
possible, and learnt what to do if the conditions of normality and equal variance are not met.
- We have also looked at how these tests can be understood as manifestations of the linear model, with data and sign-ranked data.
References
- Randall Pruim, Nicholas J. Horton, Daniel T. Kaplan, Start Teaching with R
- https://bcs.wiley.com/he-bcs/Books?action=index&itemId=111941654X&bcsId=11307
- https://statsandr.com/blog/wilcoxon-test-in-r-how-to-compare-2-groups-under-the-non-normality-assumption/
R Package Citations
Package | Version | Citation |
---|---|---|
explore | 1.3.1 | @explore |
infer | 1.0.7 | @infer |
openintro | 2.5.0 | @openintro |
resampledata | 0.3.1 | @resampledata |
TeachHist | 0.2.1 | @TeachHist |
TeachingDemos | 2.13 | @TeachingDemos |
Footnotes
https://stats.stackexchange.com/questions/2492/is-normality-testing-essentially-uselessβ©οΈ
https://www.allendowney.com/blog/2023/01/28/never-test-for-normality/β©οΈ
https://www.middleprofessor.com/files/applied-biostatistics_bookdown/_book/intro-linear-models.html#a-linear-model-can-be-fit-to-data-with-continuous-discrete-or-categorical-x-variablesβ©οΈ
https://allendowney.blogspot.com/2016/06/there-is-still-only-one-test.htmlβ©οΈ
https://stats.stackexchange.com/questions/92374/testing-large-dataset-for-normality-how-and-is-it-reliableβ©οΈ
https://en.wikipedia.org/wiki/Dummy_variable_(statistics)β©οΈ