πŸ‰ Likert Plots: Plotting Survey Data

Proportions
Likert Scale data
Author

Arvind V.

Published

December 27, 2022

Modified

June 26, 2024

Abstract
Surveys, Questions, and Responses

Setting up R Packages

library(tidyverse)
library(mosaic) # Our trusted friend
library(skimr)
library(vcd) # Michael Friendly's package, Visualizing Categorical Data
library(vcdExtra) # Categorical Data Sets
library(ggmosaic) # Mosaic Plots
library(resampledata) # More datasets

library(sjPlot) # Likert Scale Plots
library(sjlabelled) # Creating Labelled Data for Likert Plots

library(ggpubr) # Colours, Themes and new geometries in ggplot
library(ca) # Correspondence Analysis, for use some day

## Making Tables
library(kableExtra) # html styled tables

Introduction

In many business situations, we perform say customer surveys to get Likert Scale data, where several respondents rate a product or a service on a scale of Very much like, somewhat like, neutral, Dislike and Very much dislike, for example.

Plots for Survey Data

How does this data look like, and how does one plot it? Let us consider a fictitious example, followed by a real world dataset.

Case Study-1: A fictitious app Survey dataset

A fictitious QuickEZ app

We are a start-up that has an app called QuickEZ for delivery of groceries. We conduct a survey of 200 people at a local store, with the following questions,

  1. β€œHave your heard of the QuickEZ app?”
  2. β€œDo you use the QuickEZ app?”
  3. β€œDo you find it easy to use the QuickEZ app?”
  4. β€œWill you continue to use the QuickEZ app?”

where each questions is to be answered on a scale of : β€œalways”, β€œoften”, β€œsometimes”,β€œnever”.

Such data may look for example as follows:

First 10 Responses
q1 q2 q3 q4
4 2 4 4
4 3 4 1
3 2 1 2
2 3 4 1
1 1 3 4
1 3 4 4
2 4 2 2
3 3 4 1
1 1 4 4
2 1 4 4
tibble [200 Γ— 4] (S3: tbl_df/tbl/data.frame)
 $ q1: int [1:200] 4 4 3 2 1 1 2 3 1 2 ...
  ..- attr(*, "label")= Named chr "Have your heard of the QuickEZ app?"
  .. ..- attr(*, "names")= chr "q1"
  ..- attr(*, "labels")= Named num [1:4] 1 2 3 4
  .. ..- attr(*, "names")= chr [1:4] "always" "often" "sometimes" "never"
 $ q2: int [1:200] 2 3 2 3 1 3 4 3 1 1 ...
  ..- attr(*, "label")= Named chr "Do you use the QuickEZ app?"
  .. ..- attr(*, "names")= chr "q2"
  ..- attr(*, "labels")= Named num [1:4] 1 2 3 4
  .. ..- attr(*, "names")= chr [1:4] "always" "often" "sometimes" "never"
 $ q3: int [1:200] 4 4 1 4 3 4 2 4 4 4 ...
  ..- attr(*, "label")= Named chr "Do you find it easy to use the QuickEZ app?"
  .. ..- attr(*, "names")= chr "q3"
  ..- attr(*, "labels")= Named num [1:4] 1 2 3 4
  .. ..- attr(*, "names")= chr [1:4] "always" "often" "sometimes" "never"
 $ q4: int [1:200] 4 1 2 1 4 4 2 1 4 4 ...
  ..- attr(*, "label")= Named chr "Will you continue to use the QuickEZ app?"
  .. ..- attr(*, "names")= chr "q4"
  ..- attr(*, "labels")= Named num [1:4] 1 2 3 4
  .. ..- attr(*, "names")= chr [1:4] "always" "often" "sometimes" "never"

The columns here correspond to the 4 questions (q1-q4) and the rows contain the 200 responses, which have been coded as (1:4). Such data is also a form of Categorical data and we need to count and plot counts for each of the survey questions. Such a plot is called a Likert plot and it looks like this:

Figure 1

Based on this chart, since it looks like about half the survey respondents have not heard of our app, we need more publicity, and many do not find it easy to use 😿, so we have serious re-design and user testing to do !! But at least those who have managed to get past the hurdles are stating they will continue to use the app, so it does the job, but we can make it easier to use.

Case Study-2: EUROFAM Survey dataset

Here is another example of Likert data from the healthcare industry.

efc is a German data set from a European study titled EUROFAM study, on family care of older people. Following a common protocol, data were collected from national samples of approximately 1,000 family carers (i.e. caregivers) per country and clustered into comparable subgroups to facilitate cross-national analysis. The research questions in this EUROFAM study were:

  1. To what extent do family carers of older people use support services or receive financial allowances across Europe? What kind of supports and allowances do they mainly use?

  2. What are the main difficulties carers experience accessing the services used? What prevents carers from accessing unused supports that they need? What causes them to stop using still-needed services?

  3. In order to improve support provision, what can be understood about the service characteristics considered crucial by carers, and how far are these needs met? and,

  4. Which channels or actors can provide the greatest help in underpinning future policy efforts to improve access to services/supports?

We will select the variables from the efc data set that related to coping (on part of care-givers) and plot their responses after inspecting them:

```{r}
#| label: efc_data
#| layout-nrow: 2
#| column: body-outset-right
data(efc,package = "sjPlot")

efc %>% 
  select(dplyr::contains("cop")) %>% 
  head(20) 
##
efc %>% 
  select(dplyr::contains("cop")) %>% 
  str()
```
'data.frame':   908 obs. of  9 variables:
 $ c82cop1: num  3 3 2 4 3 2 4 3 3 3 ...
  ..- attr(*, "label")= chr "do you feel you cope well as caregiver?"
  ..- attr(*, "labels")= Named num [1:4] 1 2 3 4
  .. ..- attr(*, "names")= chr [1:4] "never" "sometimes" "often" "always"
 $ c83cop2: num  2 3 2 1 2 2 2 2 2 2 ...
  ..- attr(*, "label")= chr "do you find caregiving too demanding?"
  ..- attr(*, "labels")= Named num [1:4] 1 2 3 4
  .. ..- attr(*, "names")= chr [1:4] "Never" "Sometimes" "Often" "Always"
 $ c84cop3: num  2 3 1 3 1 3 4 2 3 1 ...
  ..- attr(*, "label")= chr "does caregiving cause difficulties in your relationship with your friends?"
  ..- attr(*, "labels")= Named num [1:4] 1 2 3 4
  .. ..- attr(*, "names")= chr [1:4] "Never" "Sometimes" "Often" "Always"
 $ c85cop4: num  2 3 4 1 2 3 1 1 2 2 ...
  ..- attr(*, "label")= chr "does caregiving have negative effect on your physical health?"
  ..- attr(*, "labels")= Named num [1:4] 1 2 3 4
  .. ..- attr(*, "names")= chr [1:4] "Never" "Sometimes" "Often" "Always"
 $ c86cop5: num  1 4 1 1 2 3 1 1 2 1 ...
  ..- attr(*, "label")= chr "does caregiving cause difficulties in your relationship with your family?"
  ..- attr(*, "labels")= Named num [1:4] 1 2 3 4
  .. ..- attr(*, "names")= chr [1:4] "Never" "Sometimes" "Often" "Always"
 $ c87cop6: num  1 1 1 1 2 2 2 1 1 1 ...
  ..- attr(*, "label")= chr "does caregiving cause financial difficulties?"
  ..- attr(*, "labels")= Named num [1:4] 1 2 3 4
  .. ..- attr(*, "names")= chr [1:4] "Never" "Sometimes" "Often" "Always"
 $ c88cop7: num  2 3 1 1 1 2 4 2 3 1 ...
  ..- attr(*, "label")= chr "do you feel trapped in your role as caregiver?"
  ..- attr(*, "labels")= Named num [1:4] 1 2 3 4
  .. ..- attr(*, "names")= chr [1:4] "Never" "Sometimes" "Often" "Always"
 $ c89cop8: num  3 2 4 2 4 1 1 3 1 1 ...
  ..- attr(*, "label")= chr "do you feel supported by friends/neighbours?"
  ..- attr(*, "labels")= Named num [1:4] 1 2 3 4
  .. ..- attr(*, "names")= chr [1:4] "never" "sometimes" "often" "always"
 $ c90cop9: num  3 2 3 4 4 1 4 3 3 3 ...
  ..- attr(*, "label")= chr "do you feel caregiving worthwhile?"
  ..- attr(*, "labels")= Named num [1:4] 1 2 3 4
  .. ..- attr(*, "names")= chr [1:4] "never" "sometimes" "often" "always"

The coping related variables have responses on the Likert Scale (1,2,3,4) which correspond to (never, sometimes, often, always), and each variable also has a label defining each variable. The labels are actually ( and perhaps usually ) the questions in the survey.

We can plot this data using the plot_likert function from package sjPlot:

# Set graph theme
theme_set(new = theme_custom())

efc %>% 
  select(dplyr::contains("cop")) %>% 
  sjPlot::plot_likert(title = "Caregiver Survey from EUROFAM") 

Many questions here have strong negative responses. This may indicate that policy and publicity related efforts may be required.

Colours and Orientation in the Likert Plot

One could prefer (as I do) that β€œoften” and β€œalways” scores should be toward the right and β€œsometimes” and β€œnever” scores towards the left. One can do this within the plot_likert command using:

plot_likert(..., reverse.scale = TRUE)

If you want the colours to be reversed, then…

plot_likert(..., reverse.colors = TRUE)

Try these options now in your Console! (Note the American spelling color)

Labelled Data

Note how the y-axis has been populated with Survey Questions: this is an example of a labelled dataset, where not only do the variables have names i.e. column names, but also have longish text labels that add information to the data variables. The data values ( i.e scores) in the columns is also labelled as per the the Likert scale (Like/Dislike/Strongly Dislike OR never/sometimes/often/always) etc. These Likert scores are usually a set of contiguous integers.

Variable Labels and Value Labels

Variable label is human readable description of the variable. R supports rather long variable names and these names can contain even spaces and punctuation but short variables names make coding easier. Variable label can give a nice, long description of variable. With this description it is easier to remember what those variable names refer to.

Value labels are similar to variable labels, but value labels are descriptions of the values a variable can take. Labeling values means we don’t have to remember if 1=Extremely poor and 7=Excellent or vice-versa. We can easily get dataset description and variables summary with info function.

Let us manually create one such dataset, since this is a common-enough situation1 that we have survey data and then have to label the variables and the values before plotting. We will use the R package sjlabelled to label our data.2.

It is also possible to label the tibble, the columns, and the values in similar fashion using the labelr package.3

#library(sjlabelled)

# Set graph theme
theme_set(new = theme_custom())

variable_labels <- c("Do you practice Analytics?",
                     "Do you code in R?",
                     "Have you published your R Code?",
                     "Do you use Quarto as your Workflow in R?",
                     "Will you use R at Work?")
value_labels = c("never", "sometimes","often","always") #numerically 1:4

my_survey_data <- 
  # Create toy survey data
  # 200 responses to 5 questions
  # responses on Likert Scale
  # 1:4 = "never", "sometimes","often","always")

  tibble(q1 = mosaic::sample(1:4, replace = TRUE, size = 200,
                             prob = c(0.2, 0.2, 0.5, 0.1)),
         q2 = mosaic::sample(1:4, replace = TRUE, size = 200,
                             prob = c(0.3, 0.3, 0.3, 0.1)),
         q3 = mosaic::sample(1:4, replace = TRUE, size = 200,
                             prob = c(0.2, 0.1, 0.1, 0.6)),
         q4 = mosaic::sample(1:4, replace = TRUE, size = 200,
                             prob = c(0.4, 0.2, 0.1, 0.3)),
         q5 = mosaic::sample(1:4, replace = TRUE, size = 200,
                             prob = c(0.1, 0.2, 0.5, 0.2))) %>%
  
  # Set VARIABLE labels
  sjlabelled::set_label(x = .,
                        label = variable_labels) %>%
  
  # Now set VALUE labels
  sjlabelled::set_labels(x = ., labels = value_labels)
###
head(my_survey_data, 6)
###
str(my_survey_data)
tibble [200 Γ— 5] (S3: tbl_df/tbl/data.frame)
 $ q1: int [1:200] 1 3 1 3 3 1 3 4 3 1 ...
  ..- attr(*, "label")= Named chr "Do you practice Analytics?"
  .. ..- attr(*, "names")= chr "q1"
  ..- attr(*, "labels")= Named num [1:4] 1 2 3 4
  .. ..- attr(*, "names")= chr [1:4] "never" "sometimes" "often" "always"
 $ q2: int [1:200] 3 1 3 2 1 2 3 3 1 1 ...
  ..- attr(*, "label")= Named chr "Do you code in R?"
  .. ..- attr(*, "names")= chr "q2"
  ..- attr(*, "labels")= Named num [1:4] 1 2 3 4
  .. ..- attr(*, "names")= chr [1:4] "never" "sometimes" "often" "always"
 $ q3: int [1:200] 3 2 1 4 4 1 2 2 4 4 ...
  ..- attr(*, "label")= Named chr "Have you published your R Code?"
  .. ..- attr(*, "names")= chr "q3"
  ..- attr(*, "labels")= Named num [1:4] 1 2 3 4
  .. ..- attr(*, "names")= chr [1:4] "never" "sometimes" "often" "always"
 $ q4: int [1:200] 1 1 1 4 1 4 1 1 1 1 ...
  ..- attr(*, "label")= Named chr "Do you use Quarto as your Workflow in R?"
  .. ..- attr(*, "names")= chr "q4"
  ..- attr(*, "labels")= Named num [1:4] 1 2 3 4
  .. ..- attr(*, "names")= chr [1:4] "never" "sometimes" "often" "always"
 $ q5: int [1:200] 2 4 3 1 3 2 3 3 3 1 ...
  ..- attr(*, "label")= Named chr "Will you use R at Work?"
  .. ..- attr(*, "names")= chr "q5"
  ..- attr(*, "labels")= Named num [1:4] 1 2 3 4
  .. ..- attr(*, "names")= chr [1:4] "never" "sometimes" "often" "always"
plot_likert(my_survey_data, 
            title = "Summary of Analytics Questionnaire",
            reverse.scale = TRUE,# Reverse score values on plot
            reverse.colors = FALSE, # let the colors be
            show.prc.sign = TRUE, # Show percentage sign
            legend.pos = "bottom")

It seems many people in the survey plan to use R at work!! And have published R code as well. But Quarto seems to have mixed results! But of course this is a toy dataset!!

So there we are with Survey data analysis and plots!

There are a few other plots with this type of data, which are useful in very specialized circumstances. One example of this is the agreement plot which captures the agreement between two (sets) of evaluators, on ratings given on a shared ordinal scale to a set of items. An example from the field of medical diagnosis is the opinions of two specialists on a common set of patients. However, that is for a more advanced course!

Conclusion

How are the Likert Plots for Survey data different from Bar Plots? Not very much inherently; we can view the Likert Charts as a set of stacked bar charts, based on Likert-scale response counts. At a pinch we can make a Likert Plot with vanilla bar graphs, but the elegance and power of the packages sjPlot and sjlabelled is undeniable.

Your Turn

  1. Take some of the categorical datasets from the vcd and vcdExtra packages and recreate the plots from this module. Go to https://vincentarelbundock.github.io/Rdatasets/articles/data.html and type β€œvcd” in the search box. You can directly load CSV files from there, using read_csv("url-to-csv").

  2. Including Edible Insects in our Diet!

There are several questions here for each β€œarea” of preference for edible insects: experience, fear, concern for the environment, etc. Take all the columns marked as average as your data for your Likert Plot.

References

  1. Mine Cetinkaya-Rundel and Johanna Hardin. An Introduction to Modern Statistics, Chapter 4. https://openintro-ims.netlify.app/explore-categorical.html

  2. Using the strcplot command from vcd, https://cran.r-project.org/web/packages/vcd/vignettes/strucplot.pdf

  3. Creating Frequency Tables with vcd, https://cran.r-project.org/web/packages/vcdExtra/vignettes/A_creating.html

  4. Creating mosaic plots with vcd, https://cran.r-project.org/web/packages/vcdExtra/vignettes/D_mosaics.html

  5. Michael Friendly, Corrgrams: Exploratory displays for correlation matrices. The American Statistician August 19, 2002 (v1.5). https://www.datavis.ca/papers/corrgram.pdf

  6. Visualizing Categorical Data in R

  7. H. Riedwyl & M. SchΓΌpbach (1994), Parquet diagram to plot contingency tables. In F. Faulbaum (ed.), Softstat ’93: Advances in Statistical Software, 293–299. Gustav Fischer, New York.

R Package Citations
Package Version Citation
ggmosaic 0.3.3 Jeppson, Hofmann, and Cook (2021)
ggpubr 0.6.0 Kassambara (2023)
janitor 2.2.0 Firke (2023)
kableExtra 1.4.0 Zhu (2024)
resampledata 0.3.1 Chihara and Hesterberg (2018)
sjlabelled 1.2.0 LΓΌdecke (2022)
sjPlot 2.8.16 LΓΌdecke (2024)
vcd 1.4.12 Meyer, Zeileis, and Hornik (2006); Zeileis, Meyer, and Hornik (2007); Meyer et al. (2023)
vcdExtra 0.8.5 Friendly (2023)
Chihara, Laura M., and Tim C. Hesterberg. 2018. Mathematical Statistics with Resampling and r. 2nd ed. Hoboken, NJ: John Wiley & Sons. https://sites.google.com/site/chiharahesterberg/home.
Firke, Sam. 2023. janitor: Simple Tools for Examining and Cleaning Dirty Data. https://CRAN.R-project.org/package=janitor.
Friendly, Michael. 2023. vcdExtra: β€œvcd” Extensions and Additions. https://CRAN.R-project.org/package=vcdExtra.
Jeppson, Haley, Heike Hofmann, and Di Cook. 2021. ggmosaic: Mosaic Plots in the β€œggplot2” Framework. https://CRAN.R-project.org/package=ggmosaic.
Kassambara, Alboukadel. 2023. ggpubr: β€œggplot2” Based Publication Ready Plots. https://CRAN.R-project.org/package=ggpubr.
LΓΌdecke, Daniel. 2022. sjlabelled: Labelled Data Utility Functions (Version 1.2.0). https://doi.org/10.5281/zenodo.1249215.
β€”β€”β€”. 2024. sjPlot: Data Visualization for Statistics in Social Science. https://CRAN.R-project.org/package=sjPlot.
Meyer, David, Achim Zeileis, and Kurt Hornik. 2006. β€œThe Strucplot Framework: Visualizing Multi-Way Contingency Tables with Vcd.” Journal of Statistical Software 17 (3): 1–48. https://doi.org/10.18637/jss.v017.i03.
Meyer, David, Achim Zeileis, Kurt Hornik, and Michael Friendly. 2023. vcd: Visualizing Categorical Data. https://CRAN.R-project.org/package=vcd.
Zeileis, Achim, David Meyer, and Kurt Hornik. 2007. β€œResidual-Based Shadings for Visualizing (Conditional) Independence.” Journal of Computational and Graphical Statistics 16 (3): 507–25. https://doi.org/10.1198/106186007X237856.
Zhu, Hao. 2024. kableExtra: Construct Complex Table with β€œkable” and Pipe Syntax. https://CRAN.R-project.org/package=kableExtra.
Back to top

Footnotes

  1. Piping Hot Data: Leveraging Labelled Data in R, https://www.pipinghotdata.com/posts/2020-12-23-leveraging-labelled-data-in-r/>β†©οΈŽ

  2. Label Support in R:https://cran.r-project.org/web/packages/sjlabelled/index.htmlβ†©οΈŽ

  3. Using the labelr package: https://cran.r-project.org/web/packages/labelr/vignettes/labelr-introduction.htmlβ†©οΈŽ

Citation

BibTeX citation:
@online{v.2022,
  author = {V., Arvind},
  title = {πŸ‰ {Likert} {Plots:} {Plotting} {Survey} {Data}},
  date = {2022-12-27},
  url = {https://av-quarto.netlify.app/content/courses/Analytics/Descriptive/Modules/45-SurveyData},
  langid = {en},
  abstract = {Surveys, Questions, and Responses}
}
For attribution, please cite this work as:
V., Arvind. 2022. β€œπŸ‰ Likert Plots: Plotting Survey Data.” December 27, 2022. https://av-quarto.netlify.app/content/courses/Analytics/Descriptive/Modules/45-SurveyData.