Saturday, December 30, 2017

The Mcomp Package

Mcomp Package Overview

Introduction

Makridakis Competitions (also known as the M Competitions or M-Competitions) are a series of competitions organized by teams led by forecasting researcher Spyros Makridakis and intended to evaluate and compare the accuracy of different forecasting methods. So far three competitions have taken place, named as M1 (1982), M2 (1993) and M3 (2000). The fourth competition is going to take place on year 2018 very soon.

The Mcomp package provides with the set of time series part of the M1 and M3 competitions. Specifically 1001 time series from the M1 competition and 3003 from the M3 one.

Packages

suppressPackageStartupMessages(library(Mcomp))
suppressPackageStartupMessages(library(ggplot2))
suppressPackageStartupMessages(library(gridExtra))
suppressPackageStartupMessages(library(Kmisc))

Analysis

The data is provided by the M1 and M3 lists. Their content summary can be easily printed out.

data(M1)
data(M3)
print(M1)
## M-Competition data: 1001 time series 
## 
##            Type of data
## Period      DEMOGR INDUST MACRO1 MACRO2 MICRO1 MICRO2 MICRO3 Total
##   MONTHLY       75    183     64     92     10     89    104   617
##   QUARTERLY     39     18     45     59      5     21     16   203
##   YEARLY        30     35     30     29     16     29     12   181
##   Total        144    236    139    180     31    139    132  1001
print(M3)
## M-Competition data: 3003 time series 
## 
##            Type of data
## Period      DEMOGRAPHIC FINANCE INDUSTRY MACRO MICRO OTHER Total
##   MONTHLY           111     145      334   312   474    52  1428
##   OTHER               0      29        0     0     4   141   174
##   QUARTERLY          57      76       83   336   204     0   756
##   YEARLY            245      58      102    83   146    11   645
##   Total             413     308      519   731   828   204  3003

Monthly, quarterly, yearly and some other time aggregation are available (row names) splitted among a pool of data categories (column names).

Each member of the M1 or M3 list is a Mdata object providing with:

  • sn, name of the series
  • st, series number and period. For example "Y1" denotes first yearly series, "Q20" denotes 20th quarterly series and so on
  • n, the number of observations in the time series
  • h, the number of required forecasts period Interval of the time series. Possible values are "YEARLY", "QUARTERLY", "MONTHLY" & "OTHER"
  • type The type of series. Possible values for M1 are "DEMOGR", "INDUST", "MACRO1", "MACRO2", "MICRO1", "MICRO2" & "MICRO3". Possible values for M3 are "DEMOGRAPHIC", "FINANCE", "INDUSTRY", "MACRO", "MICRO", "OTHER"
  • description, a short description of the time series
  • x, a time series of length n (the historical data)
  • xx, a time series of length h (the future data)

Let us have a look at first ten element of the M1 and M3 lists.

names(M1)[1:10]
##  [1] "YAF2"  "YAF3"  "YAF4"  "YAF5"  "YAF6"  "YAF7"  "YAF8"  "YAF9" 
##  [9] "YAF10" "YAF11"
names(M3)[1:10]
##  [1] "N0001" "N0002" "N0003" "N0004" "N0005" "N0006" "N0007" "N0008"
##  [9] "N0009" "N0010"

Each element is a Mdata S3 class instance. For example:

ts_1 <- M3[["N0001"]]
print(ts_1)
## Series: Y1
## Type of series: MICRO
## Period of series: YEARLY
## Series description: SALES ( CODE= ABT)
## 
## HISTORICAL data
## Time Series:
## Start = 1975 
## End = 1988 
## Frequency = 1 
##  [1]  940.66 1084.86 1244.98 1445.02 1683.17 2038.15 2342.52 2602.45
##  [9] 2927.87 3103.96 3360.27 3807.63 4387.88 4936.99
## 
## FUTURE data
## Time Series:
## Start = 1989 
## End = 1994 
## Frequency = 1 
## [1] 5379.75 6158.68 6876.58 7851.91 8407.84 9156.01
class(ts_1)
## [1] "Mdata"
str(ts_1)
## List of 9
##  $ st         : chr "Y1"
##  $ type       : chr "MICRO"
##  $ period     : chr "YEARLY"
##  $ description: chr "SALES ( CODE= ABT)"
##  $ sn         : chr "N0001"
##  $ x          : Time-Series [1:14] from 1975 to 1988: 941 1085 1245 1445 1683 ...
##  $ xx         : Time-Series [1:6] from 1989 to 1994: 5380 6159 6877 7852 8408 ...
##  $ h          : num 6
##  $ n          : int 14
##  - attr(*, "class")= chr "Mdata"

Plot of the time series can be achieved by the plot() or autoplot() functions, as shown below.

idx <- sample(1:length(M3), 12)

par(mfrow=c(4,3))
invisible(lapply(idx, function(x) { plot(M3[[x]])}))

par(mfrow=c(1,1))
plot_list <- invisible(lapply(idx, function(x) { autoplot(M3[[x]])}))
grob_plots <- invisible(lapply(chunk(1, length(plot_list), 12), function(x) {
  marrangeGrob(grobs=lapply(plot_list[x], ggplotGrob), nrow=4, ncol=3)}))
grob_plots
## $chunk1

We can figure out an helper function that allows the comparison of accuracy and shows resulting plots for the forecasts based on two specified fit methods and parameters. The horizon future is based on the length of future available data.

prepare_arg_list <- function(fit_func, x) {
  func_sig <- names(formals(fit_func))
  arg_list <- list()
  if ("y" %in% func_sig) {
    arg_list <- list(y=x)
  } else if ("data" %in% func_sig) {
    arg_list <- list(data=x)
  } else if ("x" %in% func_sig) {
    arg_list <- list(x=x)
  }
  arg_list
}

analyze_forecast <- function(Mobj, ts_idx, fit_func1, args1 = NULL, fit_func2, args2 = NULL) {
  ts <- Mobj[[ts_idx]]
  x <- ts$x
  xx <- ts$xx
  h <- length(xx)
  
  arg_list1 <- prepare_arg_list(fit_func1, x)
  model_1 <- do.call(fit_func1, args=c(arg_list1, args1))
  ts_fc_1 <- forecast(model_1, h=h)
  
  arg_list2 <- prepare_arg_list(fit_func2, x)
  model_2 <- do.call(fit_func2, args=c(arg_list2, args2))
  ts_fc_2 <- forecast(model_2, h=h)
  
  cat(paste("Accuracy Forecast Method:", substitute(fit_func1), "\n"))
  print(accuracy(ts_fc_1, xx))
  cat("\n")
  cat(paste("Accuracy Forecast Method:", substitute(fit_func2), "\n"))
  print(accuracy(ts_fc_2, xx))

  par(mfrow=c(1,2))
  
  plot(ts_fc_1)
  lines(xx, col ='red')
  
  plot(ts_fc_2)
  lines(xx, col='red')
  
  par(mfrow=c(1,1))
  
  list(fc1=ts_fc_1, fc2=ts_fc_2)
}

Then we can use it to compare forecast method for a given time series, future horizon and forecast methods. Some examples follows.

analyze_forecast(M3, 1, ets, "ZZZ", auto.arima, list(stepwise=FALSE))
## Accuracy Forecast Method: ets 
##                     ME      RMSE       MAE        MPE     MAPE      MASE
## Training set  15.68064  97.95396  81.58716 -0.1594235 3.688927 0.2654018
## Test set     445.03563 577.31243 480.61641  5.3426543 6.004038 1.5634378
##                   ACF1 Theil's U
## Training set 0.1011023        NA
## Test set     0.4859372 0.7108304
## 
## Accuracy Forecast Method: auto.arima 
##                     ME      RMSE      MAE      MPE     MAPE      MASE
## Training set  28.88508  88.49504  68.3793 1.096870 2.439216 0.2224368
## Test set     446.25333 578.60264 481.7033 5.358426 6.017378 1.5669735
##                   ACF1 Theil's U
## Training set 0.0146484        NA
## Test set     0.4860139  0.712472

## $fc1
##      Point Forecast    Lo 80     Hi 80    Lo 95     Hi 95
## 1989       5486.492 5164.145  5808.839 4993.505  5979.479
## 1990       6035.932 5300.681  6771.184 4911.463  7160.402
## 1991       6585.373 5324.965  7845.780 4657.745  8513.000
## 1992       7134.813 5242.131  9027.495 4240.205 10029.420
## 1993       7684.253 5053.545 10314.961 3660.932 11707.574
## 1994       8233.693 4758.261 11709.125 2918.478 13548.908
## 
## $fc2
##      Point Forecast    Lo 80    Hi 80    Lo 95     Hi 95
## 1989        5486.10 5363.602 5608.598 5298.756  5673.444
## 1990        6035.21 5761.297 6309.123 5616.295  6454.125
## 1991        6584.32 6125.975 7042.665 5883.342  7285.298
## 1992        7133.43 6462.482 7804.378 6107.303  8159.557
## 1993        7682.54 6774.072 8591.008 6293.158  9071.922
## 1994        8231.65 7063.095 9400.205 6444.500 10018.800
analyze_forecast(M3, 700, stlm, list(method=c("arima")), stlm, list(method=c("ets")))
## Accuracy Forecast Method: stlm 
##                     ME     RMSE      MAE        MPE      MAPE      MASE
## Training set -11.90287 402.5976 252.8155 -0.4288085  4.232485 0.3940372
## Test set     792.38609 863.2711 792.3861 11.8446454 11.844645 1.2350097
##                   ACF1 Theil's U
## Training set 0.1205458        NA
## Test set     0.2029283  1.993663
## 
## Accuracy Forecast Method: stlm 
##                     ME     RMSE      MAE        MPE      MAPE      MASE
## Training set -11.42326 402.6201 253.3089 -0.4211643  4.240385 0.3948063
## Test set     792.33085 863.2204 792.3309 11.8438007 11.843801 1.2349236
##                   ACF1 Theil's U
## Training set 0.1206962        NA
## Test set     0.2029283  1.993547

## $fc1
##         Point Forecast    Lo 80    Hi 80    Lo 95    Hi 95
## 1993 Q1       5799.191 5275.923 6322.459 4998.921 6599.461
## 1993 Q2       5796.247 5056.234 6536.260 4664.494 6928.000
## 1993 Q3       5838.818 4932.490 6745.145 4452.709 7224.926
## 1993 Q4       5624.900 4578.363 6671.437 4024.360 7225.440
## 1994 Q1       5799.191 4629.127 6969.255 4009.733 7588.649
## 1994 Q2       5796.247 4514.506 7077.988 3835.994 7756.500
## 1994 Q3       5838.818 4454.380 7223.256 3721.502 7956.133
## 1994 Q4       5624.900 4144.874 7104.926 3361.395 7888.405
## 
## $fc2
##         Point Forecast    Lo 80    Hi 80    Lo 95    Hi 95
## 1993 Q1       5799.246 5340.654 6257.838 5097.890 6500.602
## 1993 Q2       5796.302 5147.163 6445.441 4803.530 6789.075
## 1993 Q3       5838.873 5043.090 6634.656 4621.828 7055.917
## 1993 Q4       5624.955 4705.186 6544.724 4218.290 7031.621
## 1994 Q1       5799.246 4769.926 6828.566 4225.037 7373.456
## 1994 Q2       5796.302 4667.653 6924.951 4070.183 7522.422
## 1994 Q3       5838.873 4618.618 7059.128 3972.654 7705.092
## 1994 Q4       5624.955 4319.189 6930.721 3627.958 7621.952
analyze_forecast(M3, 700, stlm, list(method=c("arima")), ets)
## Accuracy Forecast Method: stlm 
##                     ME     RMSE      MAE        MPE      MAPE      MASE
## Training set -11.90287 402.5976 252.8155 -0.4288085  4.232485 0.3940372
## Test set     792.38609 863.2711 792.3861 11.8446454 11.844645 1.2350097
##                   ACF1 Theil's U
## Training set 0.1205458        NA
## Test set     0.2029283  1.993663
## 
## Accuracy Forecast Method: ets 
##                     ME     RMSE     MAE        MPE      MAPE      MASE
## Training set -38.53996 477.8266 314.552 -0.9081287  5.189619 0.4902595
## Test set     932.19799 990.8932 932.198 13.9795463 13.979546 1.4529200
##                    ACF1 Theil's U
## Training set 0.02021015        NA
## Test set     0.25180619  2.285006

## $fc1
##         Point Forecast    Lo 80    Hi 80    Lo 95    Hi 95
## 1993 Q1       5799.191 5275.923 6322.459 4998.921 6599.461
## 1993 Q2       5796.247 5056.234 6536.260 4664.494 6928.000
## 1993 Q3       5838.818 4932.490 6745.145 4452.709 7224.926
## 1993 Q4       5624.900 4578.363 6671.437 4024.360 7225.440
## 1994 Q1       5799.191 4629.127 6969.255 4009.733 7588.649
## 1994 Q2       5796.247 4514.506 7077.988 3835.994 7756.500
## 1994 Q3       5838.818 4454.380 7223.256 3721.502 7956.133
## 1994 Q4       5624.900 4144.874 7104.926 3361.395 7888.405
## 
## $fc2
##         Point Forecast    Lo 80    Hi 80    Lo 95    Hi 95
## 1993 Q1       5624.977 5120.941 6129.013 4854.121 6395.833
## 1993 Q2       5624.977 4911.328 6338.626 4533.545 6716.409
## 1993 Q3       5624.977 4749.886 6500.068 4286.640 6963.314
## 1993 Q4       5624.977 4613.281 6636.673 4077.722 7172.232
## 1994 Q1       5624.977 4492.488 6757.466 3892.984 7356.970
## 1994 Q2       5624.977 4382.881 6867.073 3725.355 7524.599
## 1994 Q3       5624.977 4281.718 6968.236 3570.640 7679.314
## 1994 Q4       5624.977 4187.213 7062.741 3426.107 7823.847
analyze_forecast(M3, 700, tslm, list(formula = x ~ trend + season), ets)
## Accuracy Forecast Method: tslm 
##                         ME     RMSE       MAE        MPE      MAPE
## Training set -5.050281e-14  614.945  535.8089 -0.9563518  8.701461
## Test set      1.260770e+03 1313.898 1260.7700 19.0010624 19.001062
##                   MASE      ACF1 Theil's U
## Training set 0.8351095 0.7369323        NA
## Test set     1.9650310 0.2691452  3.065128
## 
## Accuracy Forecast Method: ets 
##                     ME     RMSE     MAE        MPE      MAPE      MASE
## Training set -38.53996 477.8266 314.552 -0.9081287  5.189619 0.4902595
## Test set     932.19799 990.8932 932.198 13.9795463 13.979546 1.4529200
##                    ACF1 Theil's U
## Training set 0.02021015        NA
## Test set     0.25180619  2.285006

## $fc1
##         Point Forecast    Lo 80    Hi 80    Lo 95    Hi 95
## 1993 Q1       5425.700 4469.084 6382.316 3935.752 6915.648
## 1993 Q2       5448.222 4491.606 6404.839 3958.275 6938.170
## 1993 Q3       5441.478 4484.861 6398.094 3951.530 6931.425
## 1993 Q4       5176.700 4220.084 6133.316 3686.752 6666.648
## 1994 Q1       5272.460 4297.971 6246.949 3754.676 6790.244
## 1994 Q2       5294.982 4320.494 6269.471 3777.198 6812.766
## 1994 Q3       5288.238 4313.749 6262.726 3770.454 6806.022
## 1994 Q4       5023.460 4048.971 5997.949 3505.676 6541.244
## 
## $fc2
##         Point Forecast    Lo 80    Hi 80    Lo 95    Hi 95
## 1993 Q1       5624.977 5120.941 6129.013 4854.121 6395.833
## 1993 Q2       5624.977 4911.328 6338.626 4533.545 6716.409
## 1993 Q3       5624.977 4749.886 6500.068 4286.640 6963.314
## 1993 Q4       5624.977 4613.281 6636.673 4077.722 7172.232
## 1994 Q1       5624.977 4492.488 6757.466 3892.984 7356.970
## 1994 Q2       5624.977 4382.881 6867.073 3725.355 7524.599
## 1994 Q3       5624.977 4281.718 6968.236 3570.640 7679.314
## 1994 Q4       5624.977 4187.213 7062.741 3426.107 7823.847

There are also available forecasts produced by M3 competitors as provided by the M3Forecast list. Herein below the name of M3 competitors

names(M3Forecast)
##  [1] "NAIVE2"       "SINGLE"       "HOLT"         "DAMPEN"      
##  [5] "WINTER"       "COMB S-H-D"   "B-J auto"     "AutoBox1"    
##  [9] "AutoBox2"     "AutoBox3"     "ROBUST-Trend" "ARARMA"      
## [13] "Auto-ANN"     "Flors-Pearc1" "Flors-Pearc2" "PP-Autocast" 
## [17] "ForecastPro"  "SMARTFCS"     "THETAsm"      "THETA"       
## [21] "RBF"          "ForcX"        "AAM1"         "AAM2"

An helper function allows to evaluate the accuracy of those M3 competitors forecasts.

m3_competition_accuracy <- function(ts_idx, competitorName, plot=FALSE) {
  v <- na.omit(as.numeric(M3Forecast[[competitorName]][ts_idx,]))
  xx <- M3[[ts_idx]]$xx
  sn <- M3[[ts_idx]]$sn
  v_ts <- ts(v, frequency = frequency(xx), start = start(xx))

  if (plot) {
    plot(M3[[ts_idx]], main = sn) 
    legend("topleft", legend = c("Actual", competitorName), fill = c("red", "blue"))
    lines(v_ts, col = 'blue')
  }

  accuracy(v_ts, xx)
}
m3_competition_accuracy(150, "HOLT", TRUE)

##                ME     RMSE      MAE        MPE     MAPE      ACF1
## Test set 571.4183 2130.361 1913.418 -0.1041563 32.05945 0.5902855
##          Theil's U
## Test set  1.721647
m3_competition_accuracy(150, "ROBUST-Trend", TRUE)

##                ME    RMSE     MAE      MPE     MAPE      ACF1 Theil's U
## Test set 923.1167 2108.68 1762.95 7.199077 27.50348 0.5838244  1.512274

Conclusions

Mcomp package represents a valuable resource for time series analysis. There is also available the Tcomp package providing with 1311 time series of class Mcomp from the tourism forecasting competition conducted in 2010.