Using Permutation Tests in Undergraduate Stats Class

Using Permutation Tests

Author

Arvind Venkatadri

Published

January 30, 2022

Modified

July 29, 2025

Plot Fonts and Theme

Show the Code

library(systemfonts)
library(showtext)
## Clean the slate
systemfonts::clear_local_fonts()
systemfonts::clear_registry()
##
showtext_opts(dpi = 96) # set DPI for showtext
sysfonts::font_add(
  family = "Alegreya",
  regular = "../../../fonts/Alegreya-Regular.ttf",
  bold = "../../../fonts/Alegreya-Bold.ttf",
  italic = "../../../fonts/Alegreya-Italic.ttf",
  bolditalic = "../../../fonts/Alegreya-BoldItalic.ttf"
)

sysfonts::font_add(
  family = "Roboto Condensed",
  regular = "../../../fonts/RobotoCondensed-Regular.ttf",
  bold = "../../../fonts/RobotoCondensed-Bold.ttf",
  italic = "../../../fonts/RobotoCondensed-Italic.ttf",
  bolditalic = "../../../fonts/RobotoCondensed-BoldItalic.ttf"
)
showtext_auto(enable = TRUE) # enable showtext
##
theme_custom <- function() {
  font <- "Alegreya" # assign font family up front

  theme_classic(base_size = 14, base_family = font) %+replace% # replace elements we want to change

    theme(
      text = element_text(family = font), # set base font family

      # text elements
      plot.title = element_text( # title
        family = font, # set font family
        size = 24, # set font size
        face = "bold", # bold typeface
        hjust = 0, # left align
        margin = margin(t = 5, r = 0, b = 5, l = 0)
      ), # margin
      plot.title.position = "plot",
      plot.subtitle = element_text( # subtitle
        family = font, # font family
        size = 14, # font size
        hjust = 0, # left align
        margin = margin(t = 5, r = 0, b = 10, l = 0)
      ), # margin

      plot.caption = element_text( # caption
        family = font, # font family
        size = 9, # font size
        hjust = 1
      ), # right align

      plot.caption.position = "plot", # right align

      axis.title = element_text( # axis titles
        family = "Roboto Condensed", # font family
        size = 12
      ), # font size

      axis.text = element_text( # axis text
        family = "Roboto Condensed", # font family
        size = 9
      ), # font size

      axis.text.x = element_text( # margin for axis text
        margin = margin(5, b = 10)
      )

      # since the legend often requires manual tweaking
      # based on plot content, don't define it here
    )
}

Show the Code

```{r}
#| cache: false
#| code-fold: true
## Set the theme
theme_set(new = theme_custom())
```

Error in theme_set(new = theme_custom()): could not find function "theme_set"

Show the Code

```{r}
#| cache: false
#| code-fold: true
## Use available fonts in ggplot text geoms too!
update_geom_defaults(geom = "text", new = list(
  family = "Roboto Condensed",
  face = "plain",
  size = 3.5,
  color = "#2b2b2b"
))
```

Error in update_geom_defaults(geom = "text", new = list(family = "Roboto Condensed", : could not find function "update_geom_defaults"

Introduction

This project is a quick analysis of the Design of Experiments class carried out in the Order and Chaos course, FSP-2021-2022, at SMI MAHE, Bangalore.

The methodology followed was that in A.J. Lawrance’s paper ¹ describing a Statistics module based on the method of Design of Experiments. The inquiry relates to Short Term Memory (STM) among students.

Structure

The total number of students were 17. Eight Pairs of students were created randomly to create eight different Test tools for Short Term Memory testing.

The binary ( two - level ) variables/parameters that were used in the tests, were, following Lawrance:

WL: Word List Length ( 7 and 15 words )
SL: Syllables in the Words ( 2 and 5 syllables )
ST: Study Time allowed for the Respondents ( 15 and 30 seconds )

Other parameters considered were a) Language b) Structure/Depiction of the Word Lists ( e.g. word clouds, matrices, columns…), c) Whether the words would be shown or read aloud, and d) whether the respondents had to speak out, or write down, the recollected words. These parameters were discussed and abandoned as too complex to mechanize, though they could have had an impact on the STM scores.

Hence a total of 8 Tests were created by 8 pairs of students, and each team tested the remaining 15 students ( Due to COVID restrictions, this testing was carried out entirely online on MS Teams, using individual breakout rooms for the Test Teams. )

The data were entered into a Google Sheet and the STM scores were converted to percentages so as to be comparable across WL.

The data was then “flattened” for each of the binary parameters; this was logical to do since for each parameter, the other two parameters were balanced out by the Test structure. For instance, for WL = 5, the SL and ST parameters used all the four combinations ( SL = 5, 15 ) and (ST = 15, 30 ). Hence the “common sense” analysis could proceed for each of the parameters individually. Joint effects were not considered for this preliminary class.

Data

Show the Code

stm <- readxl::read_xlsx("RandomTesters.xlsx",
  sheet = "Data",
  range = "C31:H91"
) %>%
  janitor::clean_names()
stm

ABCDEFGHIJ0123456789

syllable_2 <dbl>	syllable_5 <dbl>	study_time_15 <dbl>	study_time_30 <dbl>	list_length_7 <dbl>	list_length_15 <dbl>
0.7142857	0.57142857	0.71428571	0.7142857	0.7142857	0.26670000
0.4285714	0.57142857	0.42857143	0.5714286	0.4285714	0.20000000
0.5714286	0.71428571	0.57142857	0.5714286	0.5714286	0.06670000
0.7142857	1.00000000	0.71428571	0.5714286	0.7142857	0.13330000
0.7142857	1.00000000	0.71428571	1.0000000	0.7142857	0.20000000
0.7142857	0.85714286	0.71428571	0.7142857	0.7142857	0.13330000
0.5714286	0.57142857	0.57142857	0.5714286	0.5714286	0.33340000
0.5714286	0.71428571	0.57142857	0.5714286	0.5714286	0.53330000
0.5714286	0.85714286	0.57142857	1.0000000	0.5714286	0.20000000
0.5714286	1.00000000	0.57142857	0.4285714	0.5714286	0.13330000

The data has scores that have been combined into single columns for each setting for each of the parameters. For example, the column syllable_2 contains STM scores for all tests that used SL = 2-syllables in their tests. The Word Length WL and Study Time ST go through all their combinations in this column. The other columns are constructed similarly.

Basic Plots

We will use Box Plots and Density Plots to compare the STM score distributions for each Parameter. To do this we need to pivot_longer the adjacent columns ( e.g. syllable_2 and syllable_5) and use these names as categorical variables:

Syllable Parameter SL

Show the Code

theme(theme_classic())

 Named list()
 - attr(*, "class")= chr [1:2] "theme" "gg"
 - attr(*, "complete")= logi FALSE
 - attr(*, "validate")= logi TRUE

Show the Code

stm_syllable <- stm %>%
  select(contains("syllable")) %>%
  pivot_longer(
    data = .,
    cols = starts_with("syllable"),
    names_to = "syllable",
    values_to = "syl_scores"
  )
stm_syllable

ABCDEFGHIJ0123456789

syllable <chr>	syl_scores <dbl>
syllable_2	0.71428571
syllable_5	0.57142857
syllable_2	0.42857143
syllable_5	0.57142857
syllable_2	0.57142857
syllable_5	0.71428571
syllable_2	0.71428571
syllable_5	1.00000000
syllable_2	0.71428571
syllable_5	1.00000000

Show the Code

p1 <- stm_syllable %>%
  ggplot(.) +
  geom_boxplot(
    aes(
      y = syl_scores,
      x = syllable,
      colour = syllable,
      fill = syllable
    ),
    alpha = 0.3
  ) +
  labs(title = "STM scores by Syllables in Test Word Lists")

p2 <- stm_syllable %>%
  ggplot(.) +
  geom_density(aes(x = syl_scores, colour = syllable, fill = syllable), alpha = 0.3) +
  labs(title = "STM scores by Syllables in Test Word Lists")

patchwork::wrap_plots(p1 + p2, guides = "collect")

Study Time Parameter ST

Show the Code

theme(theme_classic())

 Named list()
 - attr(*, "class")= chr [1:2] "theme" "gg"
 - attr(*, "complete")= logi FALSE
 - attr(*, "validate")= logi TRUE

Show the Code

stm_studytime <- stm %>%
  select(contains("study")) %>%
  pivot_longer(
    data = .,
    cols = starts_with("study"),
    names_to = "study",
    values_to = "study_scores"
  )
stm_studytime

ABCDEFGHIJ0123456789

study <chr>	study_scores <dbl>
study_time_15	0.71428571
study_time_30	0.71428571
study_time_15	0.42857143
study_time_30	0.57142857
study_time_15	0.57142857
study_time_30	0.57142857
study_time_15	0.71428571
study_time_30	0.57142857
study_time_15	0.71428571
study_time_30	1.00000000

Show the Code

p1 <- stm_studytime %>%
  ggplot(.) +
  geom_boxplot(
    aes(
      y = study_scores,
      x = study,
      colour = study,
      fill = study
    ),
    alpha = 0.3,
  ) +
  labs(title = "STM scores by Study Time in Test Word Lists")
p2 <- stm_studytime %>%
  ggplot(.) +
  geom_density(aes(x = study_scores, colour = study, fill = study), alpha = 0.3) +
  labs(title = "STM scores by Study Time in Test Word Lists")

patchwork::wrap_plots(p1 + p2, guides = "collect")

Word List Length Parameter WL

Show the Code

theme(theme_classic())

 Named list()
 - attr(*, "class")= chr [1:2] "theme" "gg"
 - attr(*, "complete")= logi FALSE
 - attr(*, "validate")= logi TRUE

Show the Code

stm_words <- stm %>%
  select(contains("list")) %>%
  pivot_longer(
    data = ., cols = starts_with("list"),
    names_to = "list", values_to = "list_scores"
  )
stm_words

ABCDEFGHIJ0123456789

list <chr>	list_scores <dbl>
list_length_7	0.71428571
list_length_15	0.26670000
list_length_7	0.42857143
list_length_15	0.20000000
list_length_7	0.57142857
list_length_15	0.06670000
list_length_7	0.71428571
list_length_15	0.13330000
list_length_7	0.71428571
list_length_15	0.20000000

Show the Code

p1 <- stm_words %>%
  ggplot(.) +
  geom_boxplot(aes(y = list_scores, x = list, colour = list, fill = list), alpha = 0.3) +
  labs(title = "STM scores by Word Count in Test Word Lists")

p2 <- stm_words %>%
  ggplot(.) +
  geom_density(aes(x = list_scores, colour = list, fill = list), alpha = 0.3) +
  labs(title = "STM scores by Study Time in Test Word Lists")

patchwork::wrap_plots(p1 + p2, guides = "collect")

Preliminary Observations

Clearly, based on visual inspection of the Plots, the Word Count seems to have a large effect on STM Test Scores, with fewer words ( 7 ) being easier to recall. Study Time ( 15 and 30 seconds ) also seems to have a more modest positive effect on STM scores, while Syllable Count ( 2 or 5 syllables ) seems to have a modest negative effect on STM scores.

Analysis

We wish to establish the significance of the effect size due to each of the Parameters. Already from the Density Plots, we can see that none of the scores are normally distributed. A quick Shapiro-Wilkes Test for each of them confirms that the scores are not normally distributed.

Hence we go for a Permutation Test to check for significance of effect.

On the other hand, as remarked in Ernst², the non-parametric permutation test can be both exact and also intuitively easier for students to grasp, as I can testify from direct observation in this class. There is no need to discuss sampling distributions and means, t-tests and the like. Permutations are easily executed in R, using packages such as mosaic³.

Show the Code

shapiro.test(stm$syllable_2)


    Shapiro-Wilk normality test

data:  stm$syllable_2
W = 0.95508, p-value = 0.02716

Show the Code

shapiro.test(stm$syllable_5)


    Shapiro-Wilk normality test

data:  stm$syllable_5
W = 0.95321, p-value = 0.02211

Show the Code

shapiro.test(stm$study_time_15)


    Shapiro-Wilk normality test

data:  stm$study_time_15
W = 0.9068, p-value = 0.0002348

Show the Code

shapiro.test(stm$study_time_30)


    Shapiro-Wilk normality test

data:  stm$study_time_30
W = 0.95539, p-value = 0.0281

Show the Code

shapiro.test(stm$list_length_7)


    Shapiro-Wilk normality test

data:  stm$list_length_7
W = 0.90542, p-value = 0.0002085

Show the Code

shapiro.test(stm$list_length_15)


    Shapiro-Wilk normality test

data:  stm$list_length_15
W = 0.92806, p-value = 0.001645

Permutation Tests

We proceed with a Permutation Test for each of the Parameters. We start with the Syllable Parameter SL. We shuffle the labels ( SL- = 2 and SL+ = 5) between the scores and determine the null distribution. This is then compared with the difference in mean scores between the unpermuted sets. We continue similarly for the other two parameters.

Show the Code

theme(theme_classic())

 Named list()
 - attr(*, "class")= chr [1:2] "theme" "gg"
 - attr(*, "complete")= logi FALSE
 - attr(*, "validate")= logi TRUE

Show the Code

# Syllable Parameter SL

obs_syl_mean <- mean(stm$syllable_2) - mean(stm$syllable_5)
obs_syl_mean

[1] 0.0153731

Show the Code

null_dist_syllable <-
  do(10000) * diff(mean(
    stm_syllable$syl_scores ~ shuffle(stm_syllable$syllable),
    data = stm_syllable
  ))
head(null_dist_syllable)

ABCDEFGHIJ0123456789

	syllable_5 <dbl>
1	0.04039310
2	0.09907452
3	0.03781500
4	-0.05430056
5	-0.04446437
6	-0.02100167

Show the Code

p1 <-
  null_dist_syllable %>%
  ggplot(., aes(x = syllable_5)) +
  geom_histogram(aes(fill = syllable_5 >= obs_syl_mean)) +
  labs(x = "Distribution of Diff in Means under null hypothesis for Syllables")


# Study Time Parameter ST
obs_study_mean <- mean(stm$study_time_30) - mean(stm$study_time_15)
obs_study_mean

[1] 0.08526183

Show the Code

null_dist_studytime <-
  do(10000) * diff(mean(
    stm_studytime$study_scores ~ shuffle(stm_studytime$study),
    data = stm_studytime
  ))
head(null_dist_studytime)

ABCDEFGHIJ0123456789

	study_time_30 <dbl>
1	0.03274325
2	0.03496992
3	-0.02887151
4	0.01563373
5	-0.02411944
6	-0.05172389

Show the Code

p2 <- null_dist_studytime %>%
  ggplot(., aes(x = study_time_30)) +
  geom_histogram(aes(fill = study_time_30 >= obs_study_mean)) +
  labs(x = "Distribution of Diff in Means under null hypothesis for Study Time")

# Word List Length Parameter WL
obs_word_mean <- mean(stm$list_length_7) - mean(stm$list_length_15)
obs_word_mean

[1] 0.2887539

Show the Code

null_dist_words <-
  do(10000) * diff(mean(stm_words$list_scores ~ shuffle(stm_words$list), data = stm_words))
head(null_dist_words)

ABCDEFGHIJ0123456789

	list_length_7 <dbl>
1	0.01751183
2	0.02132532
3	-0.11463341
4	-0.01361833
5	-0.05688897
6	-0.06065929

Show the Code

p3 <-
  null_dist_words %>%
  ggplot(., aes(x = list_length_7)) +
  geom_histogram(aes(fill = list_length_7 >= obs_word_mean)) +
  labs(x = "Distribution of Diff in Means under null hypothesis for Words")

Show the Code

# patchwork::wrap_plots(p1 + p2 + p3, nrow= 3, guides = "auto")
p1

Show the Code

p2

Show the Code

p3

Conclusions

From the above null distribution plots obtained using Permutation tests, it is clear that both Study Time ( ST ) and List Word Length ( WL) have significant effects on the Short Term Memory Scores. The probability that the observed value is obtained or exceeded by any permutation of scores is very low in both cases.

On the other hand, Syllable Count (SL) does not seem to affect the STM scores significantly.

References

Footnotes

Lawrance, A. J. 1996. “A Design of Experiments Workshop as an Introduction to Statistics.” American Statistician 50 (2): 156–58. doi:10.1080/00031305.1996.10474364.↩︎
Ernst, Michael D. 2004. “Permutation Methods: A Basis for Exact Inference.” Statistical Science 19 (4): 676–85. doi:10.1214/088342304000000396.↩︎
Pruim R, Kaplan DT, Horton NJ (2017). “The mosaic Package: Helping Students to ‘Think with Data’ Using R.” The R Journal, 9(1), 77–102. https://journal.r-project.org/archive/2017/RJ-2017-024/index.html.↩︎