library(tidyverse)
library(mosaic) # Our trusted friend
library(skimr)
library(vcd) # Michael Friendly's package, Visualizing Categorical Data
library(vcdExtra) # Categorical Data Sets
library(ggmosaic) # Mosaic Plots
library(resampledata) # More datasets
library(sjPlot) # Likert Scale Plots
library(sjlabelled) # Creating Labelled Data for Likert Plots
library(ggpubr) # Colours, Themes and new geometries in ggplot
library(ca) # Correspondence Analysis, for use some day
## Making Tables
library(kableExtra) # html styled tables
π Likert Plots: Plotting Survey Data
Setting up R Packages
Introduction
In many business situations, we perform say customer surveys to get Likert Scale data, where several respondents rate a product or a service on a scale of Very much like
, somewhat like
, neutral
, Dislike
and Very much dislike,
for example.
Plots for Survey Data
How does this data look like, and how does one plot it? Let us consider a fictitious example, followed by a real world dataset.
Case Study-1: A fictitious app Survey dataset
We are a start-up that has an app called QuickEZ for delivery of groceries. We conduct a survey of 200 people at a local store, with the following questions,
- βHave your heard of the QuickEZ app?β
- βDo you use the QuickEZ app?β
- βDo you find it easy to use the QuickEZ app?β
- βWill you continue to use the QuickEZ app?β
where each questions is to be answered on a scale of : βalwaysβ, βoftenβ, βsometimesβ,βneverβ.
Such data may look for example as follows:
q1 | q2 | q3 | q4 |
---|---|---|---|
4 | 2 | 4 | 4 |
4 | 3 | 4 | 1 |
3 | 2 | 1 | 2 |
2 | 3 | 4 | 1 |
1 | 1 | 3 | 4 |
1 | 3 | 4 | 4 |
2 | 4 | 2 | 2 |
3 | 3 | 4 | 1 |
1 | 1 | 4 | 4 |
2 | 1 | 4 | 4 |
tibble [200 Γ 4] (S3: tbl_df/tbl/data.frame)
$ q1: int [1:200] 4 4 3 2 1 1 2 3 1 2 ...
..- attr(*, "label")= Named chr "Have your heard of the QuickEZ app?"
.. ..- attr(*, "names")= chr "q1"
..- attr(*, "labels")= Named num [1:4] 1 2 3 4
.. ..- attr(*, "names")= chr [1:4] "always" "often" "sometimes" "never"
$ q2: int [1:200] 2 3 2 3 1 3 4 3 1 1 ...
..- attr(*, "label")= Named chr "Do you use the QuickEZ app?"
.. ..- attr(*, "names")= chr "q2"
..- attr(*, "labels")= Named num [1:4] 1 2 3 4
.. ..- attr(*, "names")= chr [1:4] "always" "often" "sometimes" "never"
$ q3: int [1:200] 4 4 1 4 3 4 2 4 4 4 ...
..- attr(*, "label")= Named chr "Do you find it easy to use the QuickEZ app?"
.. ..- attr(*, "names")= chr "q3"
..- attr(*, "labels")= Named num [1:4] 1 2 3 4
.. ..- attr(*, "names")= chr [1:4] "always" "often" "sometimes" "never"
$ q4: int [1:200] 4 1 2 1 4 4 2 1 4 4 ...
..- attr(*, "label")= Named chr "Will you continue to use the QuickEZ app?"
.. ..- attr(*, "names")= chr "q4"
..- attr(*, "labels")= Named num [1:4] 1 2 3 4
.. ..- attr(*, "names")= chr [1:4] "always" "often" "sometimes" "never"
The columns here correspond to the 4 questions (q1-q4) and the rows contain the 200 responses, which have been coded as (1:4). Such data is also a form of Categorical data and we need to count and plot counts for each of the survey questions. Such a plot is called a Likert plot and it looks like this:
Based on this chart, since it looks like about half the survey respondents have not heard of our app, we need more publicity, and many do not find it easy to use πΏ, so we have serious re-design and user testing to do !! But at least those who have managed to get past the hurdles are stating they will continue to use the app, so it does the job, but we can make it easier to use.
Case Study-2: EUROFAM Survey dataset
Here is another example of Likert data from the healthcare industry.
efc
is a German data set from a European study titled EUROFAM study, on family care of older people. Following a common protocol, data were collected from national samples of approximately 1,000 family carers (i.e. caregivers) per country and clustered into comparable subgroups to facilitate cross-national analysis. The research questions in this EUROFAM study were:
To what extent do family carers of older people use support services or receive financial allowances across Europe? What kind of supports and allowances do they mainly use?
What are the main difficulties carers experience accessing the services used? What prevents carers from accessing unused supports that they need? What causes them to stop using still-needed services?
In order to improve support provision, what can be understood about the service characteristics considered crucial by carers, and how far are these needs met? and,
Which channels or actors can provide the greatest help in underpinning future policy efforts to improve access to services/supports?
We will select the variables from the efc
data set that related to coping (on part of care-givers) and plot their responses after inspecting them:
```{r}
#| label: efc_data
#| layout-nrow: 2
#| column: body-outset-right
data(efc,package = "sjPlot")
efc %>%
select(dplyr::contains("cop")) %>%
head(20)
##
efc %>%
select(dplyr::contains("cop")) %>%
str()
```
'data.frame': 908 obs. of 9 variables:
$ c82cop1: num 3 3 2 4 3 2 4 3 3 3 ...
..- attr(*, "label")= chr "do you feel you cope well as caregiver?"
..- attr(*, "labels")= Named num [1:4] 1 2 3 4
.. ..- attr(*, "names")= chr [1:4] "never" "sometimes" "often" "always"
$ c83cop2: num 2 3 2 1 2 2 2 2 2 2 ...
..- attr(*, "label")= chr "do you find caregiving too demanding?"
..- attr(*, "labels")= Named num [1:4] 1 2 3 4
.. ..- attr(*, "names")= chr [1:4] "Never" "Sometimes" "Often" "Always"
$ c84cop3: num 2 3 1 3 1 3 4 2 3 1 ...
..- attr(*, "label")= chr "does caregiving cause difficulties in your relationship with your friends?"
..- attr(*, "labels")= Named num [1:4] 1 2 3 4
.. ..- attr(*, "names")= chr [1:4] "Never" "Sometimes" "Often" "Always"
$ c85cop4: num 2 3 4 1 2 3 1 1 2 2 ...
..- attr(*, "label")= chr "does caregiving have negative effect on your physical health?"
..- attr(*, "labels")= Named num [1:4] 1 2 3 4
.. ..- attr(*, "names")= chr [1:4] "Never" "Sometimes" "Often" "Always"
$ c86cop5: num 1 4 1 1 2 3 1 1 2 1 ...
..- attr(*, "label")= chr "does caregiving cause difficulties in your relationship with your family?"
..- attr(*, "labels")= Named num [1:4] 1 2 3 4
.. ..- attr(*, "names")= chr [1:4] "Never" "Sometimes" "Often" "Always"
$ c87cop6: num 1 1 1 1 2 2 2 1 1 1 ...
..- attr(*, "label")= chr "does caregiving cause financial difficulties?"
..- attr(*, "labels")= Named num [1:4] 1 2 3 4
.. ..- attr(*, "names")= chr [1:4] "Never" "Sometimes" "Often" "Always"
$ c88cop7: num 2 3 1 1 1 2 4 2 3 1 ...
..- attr(*, "label")= chr "do you feel trapped in your role as caregiver?"
..- attr(*, "labels")= Named num [1:4] 1 2 3 4
.. ..- attr(*, "names")= chr [1:4] "Never" "Sometimes" "Often" "Always"
$ c89cop8: num 3 2 4 2 4 1 1 3 1 1 ...
..- attr(*, "label")= chr "do you feel supported by friends/neighbours?"
..- attr(*, "labels")= Named num [1:4] 1 2 3 4
.. ..- attr(*, "names")= chr [1:4] "never" "sometimes" "often" "always"
$ c90cop9: num 3 2 3 4 4 1 4 3 3 3 ...
..- attr(*, "label")= chr "do you feel caregiving worthwhile?"
..- attr(*, "labels")= Named num [1:4] 1 2 3 4
.. ..- attr(*, "names")= chr [1:4] "never" "sometimes" "often" "always"
The cop
ing related variables have responses on the Likert Scale (1,2,3,4)
which correspond to (never, sometimes, often, always)
, and each variable also has a label
defining each variable. The labels are actually ( and perhaps usually ) the questions in the survey.
We can plot this data using the plot_likert
function from package sjPlot
:
Many questions here have strong negative responses. This may indicate that policy and publicity related efforts may be required.
One could prefer (as I do) that βoftenβ and βalwaysβ scores should be toward the right and βsometimesβ and βneverβ scores towards the left. One can do this within the plot_likert
command using:
plot_likert(..., reverse.scale = TRUE)
If you want the colours to be reversed, thenβ¦
plot_likert(..., reverse.colors = TRUE)
Try these options now in your Console! (Note the American spelling color
)
Labelled Data
Note how the y-axis has been populated with Survey Questions: this is an example of a labelled dataset, where not only do the variables have names i.e. column names, but also have longish text labels that add information to the data variables. The data values ( i.e scores) in the columns is also labelled as per the the Likert scale (Like/Dislike/Strongly Dislike
OR never/sometimes/often/always
) etc. These Likert scores are usually a set of contiguous integers.
Variable label is human readable description of the variable. R supports rather long variable names and these names can contain even spaces and punctuation but short variables names make coding easier. Variable label can give a nice, long description of variable. With this description it is easier to remember what those variable names refer to.
Value labels are similar to variable labels, but value labels are descriptions of the values a variable can take. Labeling values means we donβt have to remember if 1=Extremely poor
and 7=Excellent
or vice-versa. We can easily get dataset description and variables summary with info function.
Let us manually create one such dataset, since this is a common-enough situation1 that we have survey data and then have to label the variables and the values before plotting. We will use the R package sjlabelled
to label our data.2.
It is also possible to label the tibble, the columns, and the values in similar fashion using the labelr
package.3
#library(sjlabelled)
# Set graph theme
theme_set(new = theme_custom())
variable_labels <- c("Do you practice Analytics?",
"Do you code in R?",
"Have you published your R Code?",
"Do you use Quarto as your Workflow in R?",
"Will you use R at Work?")
value_labels = c("never", "sometimes","often","always") #numerically 1:4
my_survey_data <-
# Create toy survey data
# 200 responses to 5 questions
# responses on Likert Scale
# 1:4 = "never", "sometimes","often","always")
tibble(q1 = mosaic::sample(1:4, replace = TRUE, size = 200,
prob = c(0.2, 0.2, 0.5, 0.1)),
q2 = mosaic::sample(1:4, replace = TRUE, size = 200,
prob = c(0.3, 0.3, 0.3, 0.1)),
q3 = mosaic::sample(1:4, replace = TRUE, size = 200,
prob = c(0.2, 0.1, 0.1, 0.6)),
q4 = mosaic::sample(1:4, replace = TRUE, size = 200,
prob = c(0.4, 0.2, 0.1, 0.3)),
q5 = mosaic::sample(1:4, replace = TRUE, size = 200,
prob = c(0.1, 0.2, 0.5, 0.2))) %>%
# Set VARIABLE labels
sjlabelled::set_label(x = .,
label = variable_labels) %>%
# Now set VALUE labels
sjlabelled::set_labels(x = ., labels = value_labels)
###
head(my_survey_data, 6)
###
str(my_survey_data)
tibble [200 Γ 5] (S3: tbl_df/tbl/data.frame)
$ q1: int [1:200] 1 3 1 3 3 1 3 4 3 1 ...
..- attr(*, "label")= Named chr "Do you practice Analytics?"
.. ..- attr(*, "names")= chr "q1"
..- attr(*, "labels")= Named num [1:4] 1 2 3 4
.. ..- attr(*, "names")= chr [1:4] "never" "sometimes" "often" "always"
$ q2: int [1:200] 3 1 3 2 1 2 3 3 1 1 ...
..- attr(*, "label")= Named chr "Do you code in R?"
.. ..- attr(*, "names")= chr "q2"
..- attr(*, "labels")= Named num [1:4] 1 2 3 4
.. ..- attr(*, "names")= chr [1:4] "never" "sometimes" "often" "always"
$ q3: int [1:200] 3 2 1 4 4 1 2 2 4 4 ...
..- attr(*, "label")= Named chr "Have you published your R Code?"
.. ..- attr(*, "names")= chr "q3"
..- attr(*, "labels")= Named num [1:4] 1 2 3 4
.. ..- attr(*, "names")= chr [1:4] "never" "sometimes" "often" "always"
$ q4: int [1:200] 1 1 1 4 1 4 1 1 1 1 ...
..- attr(*, "label")= Named chr "Do you use Quarto as your Workflow in R?"
.. ..- attr(*, "names")= chr "q4"
..- attr(*, "labels")= Named num [1:4] 1 2 3 4
.. ..- attr(*, "names")= chr [1:4] "never" "sometimes" "often" "always"
$ q5: int [1:200] 2 4 3 1 3 2 3 3 3 1 ...
..- attr(*, "label")= Named chr "Will you use R at Work?"
.. ..- attr(*, "names")= chr "q5"
..- attr(*, "labels")= Named num [1:4] 1 2 3 4
.. ..- attr(*, "names")= chr [1:4] "never" "sometimes" "often" "always"
plot_likert(my_survey_data,
title = "Summary of Analytics Questionnaire",
reverse.scale = TRUE,# Reverse score values on plot
reverse.colors = FALSE, # let the colors be
show.prc.sign = TRUE, # Show percentage sign
legend.pos = "bottom")
It seems many people in the survey plan to use R at work!! And have published R code as well. But Quarto seems to have mixed results! But of course this is a toy dataset!!
So there we are with Survey data analysis and plots!
There are a few other plots with this type of data, which are useful in very specialized circumstances. One example of this is the agreement plot
which captures the agreement between two (sets) of evaluators, on ratings given on a shared ordinal scale to a set of items. An example from the field of medical diagnosis is the opinions of two specialists on a common set of patients. However, that is for a more advanced course!
Conclusion
How are the Likert Plots for Survey data different from Bar Plots? Not very much inherently; we can view the Likert Charts as a set of stacked bar charts, based on Likert-scale response counts. At a pinch we can make a Likert Plot with vanilla bar graphs, but the elegance and power of the packages sjPlot
and sjlabelled
is undeniable.
Your Turn
Take some of the categorical datasets from the
vcd
andvcdExtra
packages and recreate the plots from this module. Go to https://vincentarelbundock.github.io/Rdatasets/articles/data.html and type βvcdβ in thesearch
box. You can directly load CSV files from there, usingread_csv("url-to-csv")
.Including Edible Insects in our Diet!
There are several questions here for each βareaβ of preference for edible insects: experience, fear, concern for the environment, etc. Take all the columns marked as average as your data for your Likert Plot.
References
Mine Cetinkaya-Rundel and Johanna Hardin. An Introduction to Modern Statistics, Chapter 4. https://openintro-ims.netlify.app/explore-categorical.html
Using the
strcplot
command fromvcd
, https://cran.r-project.org/web/packages/vcd/vignettes/strucplot.pdfCreating Frequency Tables with
vcd
, https://cran.r-project.org/web/packages/vcdExtra/vignettes/A_creating.htmlCreating mosaic plots with
vcd
, https://cran.r-project.org/web/packages/vcdExtra/vignettes/D_mosaics.htmlMichael Friendly, Corrgrams: Exploratory displays for correlation matrices. The American Statistician August 19, 2002 (v1.5). https://www.datavis.ca/papers/corrgram.pdf
H. Riedwyl & M. SchΓΌpbach (1994), Parquet diagram to plot contingency tables. In F. Faulbaum (ed.), Softstat β93: Advances in Statistical Software, 293β299. Gustav Fischer, New York.
R Package Citations
Package | Version | Citation |
---|---|---|
ggmosaic | 0.3.3 | Jeppson, Hofmann, and Cook (2021) |
ggpubr | 0.6.0 | Kassambara (2023) |
janitor | 2.2.0 | Firke (2023) |
kableExtra | 1.4.0 | Zhu (2024) |
resampledata | 0.3.1 | Chihara and Hesterberg (2018) |
sjlabelled | 1.2.0 | LΓΌdecke (2022) |
sjPlot | 2.8.16 | LΓΌdecke (2024) |
vcd | 1.4.12 | Meyer, Zeileis, and Hornik (2006); Zeileis, Meyer, and Hornik (2007); Meyer et al. (2023) |
vcdExtra | 0.8.5 | Friendly (2023) |
Footnotes
Piping Hot Data: Leveraging Labelled Data in R, https://www.pipinghotdata.com/posts/2020-12-23-leveraging-labelled-data-in-r/>β©οΈ
Label Support in R:https://cran.r-project.org/web/packages/sjlabelled/index.htmlβ©οΈ
Using the
labelr
package: https://cran.r-project.org/web/packages/labelr/vignettes/labelr-introduction.htmlβ©οΈ
Citation
@online{v.2022,
author = {V., Arvind},
title = {π {Likert} {Plots:} {Plotting} {Survey} {Data}},
date = {2022-12-27},
url = {https://av-quarto.netlify.app/content/courses/Analytics/Descriptive/Modules/45-SurveyData},
langid = {en},
abstract = {Surveys, Questions, and Responses}
}