EDA Workflow
EDA
Workflow
Descriptive
Abstract
A complete EDA Workflow
Setting up R Packages
Install packages using install.packages()
in your Console. Load up your libraries in a setup
chunk:
Read Data
Examine Data
- Use
dplyr::glimpse()
- Use
mosaic::inspect()
orskimr::skim()
- Use
dplyr::summarise()
andcrosstable::crosstable()
- Format your tables with
knitr::kable()
- Highlight any interesting summary stats or data imbalances
Data Dictionary and Experiment Description
- A table containing the variable names, their interpretation, and their nature(Qual/Quant/Ord…)
- If there are wrongly coded variables in the original data, state them in their correct form, so you can munge the in the next step
- Declare what might be target and predictor variables, based on available information of the experiment, or a description of the data
Data Munging
- Convert variables to factors as needed
- Reformat / Rename other variables as needed
- Clean badly formatted columns (e.g. text + numbers) using
tidyr::separate_**_**()
- Save the data as a modified file
- Do not mess up the original data file
Form Hypotheses
Question-1
- State the Question or Hypothesis
- (Temporarily) Drop variables using
dplyr::select()
- Create new variables if needed with
dplyr::mutate()
- Filter the data set using
dplyr::filter()
- Reformat data if needed with
tidyr::pivot_longer()
ortidyr::pivot_wider()
- Answer the Question with a Table, a Chart, a Test, using an appropriate Model for Statistical Inference
- Use
title
,subtitle
,legend
andscales
appropriately in your chart - Prefer
ggformula
unless you are using a chart that is not yet supported therein (eg.ggbump()
orplot_likert()
)
Inference-1
. . . .
Question-n
Inference-n
One Most Interesting Graph
Conclusion
Describe what the graph shows and why it so interesting. What could be done next?