library(tidyverse)
library(mosaic)
library(ggformula)
##########################################
# Install core TimeSeries Packages
# library(ctv)
# ctv::install.views("TimeSeries", coreOnly = TRUE)
# To update core TimeSeries packages
# ctv::update.views("TimeSeries")
# Time Series Core Packages
##########################################
library(tsibble)
library(feasts) # Feature Extraction and Statistics for Time Series
library(fable) # Forecasting Models for Tidy Time Series
library(tseries) # Time Series Analysis and Computational Finance
library(forecast)
library(zoo)
##########################################
library(tsibbledata) # Time Series Demo Datasets
Lab-12: Time is a Him!!
Time Series in R
Introduction
Time Series data are important in data visualization where events have a temporal dimension, such as with finance, transportation, music, telecommunications for example.
Introduction to Time Series: Data Formats
There are multiple formats for time series data.
- The base
ts
format: Thestats::ts()
function will convert a numeric vector into an R time series object. The format ists(vector, start=, end=, frequency=)
where start and end are the times of the first and last observation and frequency is the number of observations per unit time (1=annual, 4=quarterly, 12=monthly, etc.). Used by established packages likeforecast
- Tibble format: the simplest is of course the standard tibble / dataframe, with a
time
variable to indicate that the other variables vary with time. Used by more recent packages such astimetk
&modeltime
- The modern
tsibble
(time series tibble) format: this is a new format for time series analysis, and is used by the tidyverts set of packages (fable
,feasts
and others). - There is also a
tsbox
package from ROpenScience that allows easy inter-conversion between these ( and other! ) formats!
Creating time series
In this first example, we will use simple ts
data, and then do another with a tibble
dataset, and then a third example with an tsibble
formatted dataset.
ts
format data
There are a few datasets in base R that are in ts
format already.
AirPassengers
Jan Feb Mar Apr May Jun Jul Aug Sep Oct Nov Dec
1949 112 118 132 129 121 135 148 148 136 119 104 118
1950 115 126 141 135 125 149 170 170 158 133 114 140
1951 145 150 178 163 172 178 199 199 184 162 146 166
1952 171 180 193 181 183 218 230 242 209 191 172 194
1953 196 196 236 235 229 243 264 272 237 211 180 201
1954 204 188 235 227 234 264 302 293 259 229 203 229
1955 242 233 267 269 270 315 364 347 312 274 237 278
1956 284 277 317 313 318 374 413 405 355 306 271 306
1957 315 301 356 348 355 422 465 467 404 347 305 336
1958 340 318 362 348 363 435 491 505 404 359 310 337
1959 360 342 406 396 420 472 548 559 463 407 362 405
1960 417 391 419 461 472 535 622 606 508 461 390 432
str(AirPassengers)
Time-Series [1:144] from 1949 to 1961: 112 118 132 129 121 135 148 148 136 119 ...
This can be easily plotted using base R:
plot(AirPassengers)
Let us take data that is “time oriented” but not in ts
format, and convert it to ts
: the syntax of ts()
is:
Syntax: objectName <- ts(data, start, end, frequency)
where - data
represents the data vector - start
represents the first observation in time series
- end
represents the last observation in time series
- frequency
represents number of observations per unit time. For example, frequency=1
for monthly data.
We will pick simple numerical vector data ( i.e. not a timeseries ) ChickWeight
:
ts
format
The ts
format is not recommended for new analysis since it does not permit inclusion of multiple time series in one dataset, nor other categorical variables for grouping etc.
tibble
format data
Some “time-oriented” datasets are available in tibble
form. Let us try to plot one, the walmart_sales_weekly
dataset from the timetk
package:
data(walmart_sales_weekly, package = "timetk")
walmart_sales_weekly
This dataset is a tibble with a Date
column. The Dept
column is clearly a categorical column that allows us to distinguish separate time series, i.e. one for each value of Dept
. We will convert that to a factor( it is an double precision number ) and then plot the data using this column on the Date
on the \(x\)-axis:
walmart_sales_weekly %>%
# convert Dept number to a **categorical factor**
mutate(Dept = forcats::as_factor(Dept)) %>%
gf_point(Weekly_Sales ~ Date,
group = ~ Dept,
colour = ~ Dept, data = .) %>%
gf_line() %>%
gf_theme(theme_minimal())
For more analysis and forecasting etc., it is useful to convert this tibble into a tsibble
:
walmart_tsibble <- as_tsibble(walmart_sales_weekly,
index = Date,
key = c(id, Dept))
walmart_tsibble
The 7D
states the data is weekly. There is a Date
column and all the other numerical variables are time-varying quantities. The categorical variables such as id
, and Dept
allow us to identify separate time series in the data, and these have 7 combinations hence are 7 time series in this data, as indicated.
Let us plot Weekly_Sales
, colouring the time series by Dept
:
tsibble
format data
In the packages tsibbledata
and fpp3
we have a good choice of tsibble
format data. Let us pick one:
hh_budget
There are 4 keys ( id variables ) here, one for each country. Six other quantitative columns are the individual series. Let us plot (some of) the timeseries:
ggplot2::theme_set(theme_classic())
hh_budget %>%
gf_path(Debt ~ Year, colour = ~ Country,
title = "Debt over Time")
##
hh_budget %>%
gf_path(Savings ~ Year, colour = ~ Country,
title = "Savings over Time")
##
hh_budget %>%
gf_path(Expenditure ~ Year, colour = ~ Country,
title = "Expenditure over Time")
##
hh_budget %>%
gf_path(Wealth ~ Year, colour = ~ Country,
title = "Wealth over Time")
One more example
Often we have data in table form, that is time-oriented, with a date like column, and we need to convert it into a tsibble
for analysis:
prison <- readr::read_csv("https://OTexts.com/fpp3/extrafiles/prison_population.csv")
glimpse(prison)
Rows: 3,072
Columns: 6
$ Date <date> 2005-03-01, 2005-03-01, 2005-03-01, 2005-03-01, 2005-03-01…
$ State <chr> "ACT", "ACT", "ACT", "ACT", "ACT", "ACT", "ACT", "ACT", "NS…
$ Gender <chr> "Female", "Female", "Female", "Female", "Male", "Male", "Ma…
$ Legal <chr> "Remanded", "Remanded", "Sentenced", "Sentenced", "Remanded…
$ Indigenous <chr> "ATSI", "Non-ATSI", "ATSI", "Non-ATSI", "ATSI", "Non-ATSI",…
$ Count <dbl> 0, 2, 0, 5, 7, 58, 5, 101, 51, 131, 145, 323, 355, 1617, 12…
We have a Date
column for the time index
, and we have unique key
variables like State, Gender, Legal
and Indigenous
. Count
is the value that is variable over time. It also appears that the data is quarterly
, since mosaic::inspect
reports the max_diff
in the Date
column as \(92\). .
mosaic::inspect(prison)
in your Consoleprison_tsibble <- prison %>%
mutate(quarter = yearquarter(Date)) %>%
select(-Date) %>% # Remove the Date column now that we have quarters
as_tsibble(index = quarter, key = c(State, Gender, Legal, Indigenous))
prison_tsibble
(Here, ATSI stands for Aboriginal or Torres Strait Islander.). We have \(64\) time series here, organized quarterly.
Let us examine the key
variables:
So we can plot the time series, faceted / coloured by State
:
prison_tsibble %>%
tsibble::index_by() %>%
group_by(Indigenous, State) %>%
#filter(State == "NSW") %>%
summarise(Total = sum(Count)) %>%
gf_point(Total ~quarter, colour = ~ Indigenous,
shape = ~ Indigenous) %>%
gf_line() %>%
# Note that the y-axes are all. different!!
gf_facet_wrap(vars(State), scale = "free_y") %>%
gf_theme(theme_minimal())
Hmm…looks like New South Wales(NSW) as something different going on compared to the rest of the states in Aus. Because of the large cities there…
Decomposing Time Series
We can decompose the Weekly_Sales
into components representing trends, seasonal events that repeat, and irregular noise. Since each Dept could have a different set of trends, we will do this first for one Dept, say Dept #95:
walmart_decomposed_season <- walmart_tsibble %>%
dplyr::filter(Dept == "95") %>% # filter for Dept 95
#
# feasts depends upon fabletools.
#
fabletools::model(
season = STL(Weekly_Sales ~ season(window = "periodic")))
walmart_decomposed_season %>% fabletools::components()
###
walmart_decomposed_ets <- walmart_tsibble %>%
dplyr::filter(Dept == "95") %>% # filter for Dept 95
#
# feasts depends upon fabletools.
#
fabletools::model(
ets = ETS(box_cox(Weekly_Sales, 0.3)))
###
walmart_decomposed_ets %>% fabletools::components()
###
walmart_decomposed_arima <- walmart_tsibble %>%
dplyr::filter(Dept == "95") %>% # filter for Dept 95
fabletools::model(arima = ARIMA(log(Weekly_Sales)))
walmart_decomposed_arima %>% broom::tidy()
walmart_decomposed_season %>%
components() %>%
autoplot() +
labs( title = "Seasonal Variations in Weekly Sales, Dept #95")
walmart_decomposed_ets %>%
components() %>%
autoplot() +
labs( title = "ETS Variations in Weekly Sales, Dept #95")
Conclusion
TBW