Introduction
In this post, I am starting the definition of an ARMA-GARCH model for Dow Jones Industrial Average (DJIA) daily trade volume log-ratio.
Packages
The packages being used in this post series are herein listed.
suppressPackageStartupMessages(library(lubridate))
suppressPackageStartupMessages(library(fBasics))
suppressPackageStartupMessages(library(lmtest))
suppressPackageStartupMessages(library(urca))
suppressPackageStartupMessages(library(ggplot2))
suppressPackageStartupMessages(library(quantmod))
suppressPackageStartupMessages(library(PerformanceAnalytics))
suppressPackageStartupMessages(library(rugarch))
suppressPackageStartupMessages(library(FinTS))
suppressPackageStartupMessages(library(forecast))
suppressPackageStartupMessages(library(strucchange))
suppressPackageStartupMessages(library(TSA))
Getting Data
We upload the environment status.
load(file='DowEnvironment.RData')
Structural Breaks in Daily Trade Volume
In part 2, we already highlighted some level shifts occurring within the daily trade volume. Here is the plot we already commented in part 2.
plot(dj_vol)
Now we want to identify and date such level shifts. First we verify that a linear regression with constant mean is statistically significative.
lm_dj_vol <- lm(dj_vol ~ 1, data = dj_vol)
summary(lm_dj_vol)
##
## Call:
## lm(formula = dj_vol ~ 1, data = dj_vol)
##
## Residuals:
## Min 1Q Median 3Q Max
## -193537268 -91369768 -25727268 65320232 698562732
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 201947268 2060772 98 <2e-16 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 113200000 on 3019 degrees of freedom
Such linear regression model is statistically significative, hence it makes sense to proceed on with the breakpoints structural identification with respect a constant value.
bp_dj_vol <- breakpoints(dj_vol ~ 1, h = 0.1)
summary(bp_dj_vol)
##
## Optimal (m+1)-segment partition:
##
## Call:
## breakpoints.formula(formula = dj_vol ~ 1, h = 0.1)
##
## Breakpoints at observation number:
##
## m = 1 2499
## m = 2 896 2499
## m = 3 626 1254 2499
## m = 4 342 644 1254 2499
## m = 5 342 644 1219 1649 2499
## m = 6 320 622 924 1251 1649 2499
## m = 7 320 622 924 1251 1692 2172 2499
## m = 8 320 622 924 1251 1561 1863 2172 2499
##
## Corresponding to breakdates:
##
## m = 1
## m = 2 0.296688741721854
## m = 3 0.207284768211921
## m = 4 0.113245033112583 0.213245033112583
## m = 5 0.113245033112583 0.213245033112583
## m = 6 0.105960264900662 0.205960264900662 0.305960264900662
## m = 7 0.105960264900662 0.205960264900662 0.305960264900662
## m = 8 0.105960264900662 0.205960264900662 0.305960264900662
##
## m = 1
## m = 2
## m = 3 0.41523178807947
## m = 4 0.41523178807947
## m = 5 0.40364238410596 0.546026490066225
## m = 6 0.414238410596027 0.546026490066225
## m = 7 0.414238410596027 0.560264900662252
## m = 8 0.414238410596027 0.516887417218543 0.616887417218543
##
## m = 1 0.827483443708609
## m = 2 0.827483443708609
## m = 3 0.827483443708609
## m = 4 0.827483443708609
## m = 5 0.827483443708609
## m = 6 0.827483443708609
## m = 7 0.719205298013245 0.827483443708609
## m = 8 0.719205298013245 0.827483443708609
##
## Fit:
##
## m 0 1 2 3 4 5 6
## RSS 3.872e+19 2.772e+19 1.740e+19 1.547e+19 1.515e+19 1.490e+19 1.475e+19
## BIC 1.206e+05 1.196e+05 1.182e+05 1.179e+05 1.178e+05 1.178e+05 1.178e+05
##
## m 7 8
## RSS 1.472e+19 1.478e+19
## BIC 1.178e+05 1.178e+05
It gives this plot.
plot(bp_dj_vol)
The minimum BIC is reached at breaks = 6.
Here is the plot of the DJIA daily trade volume with level shifts (red line).
bp_fit <- fitted(bp_dj_vol, breaks=6)
bp_fit_xts <- xts(bp_fit, order.by = index(dj_vol))
bb <- merge(bp_fit_xts, dj_vol)
colnames(bb) <- c("Level", "DJI.Volume")
plot(bb, lwd = c(3,1), col = c("red", "black"))
Here are the calendar dates when such level shifts occur.
bd <- breakdates(bp_dj_vol)
bd <- as.integer(nrow(dj_vol)*bd)
index(dj_vol)[bd]
## [1] "2008-04-10" "2009-06-22" "2010-09-01" "2011-12-16" "2013-07-22"
## [6] "2016-12-02"
See ref. [10] for further details about structural breaks.
Daily Trade Volume Log-ratio Model
As shown in part 2, the daily trade volume log ratio:
\[ v_{t}\ := ln \frac{V_{t}}{V_{t-1}} \]
is already available within our data environment. We plot it.
plot(dj_vol_log_ratio)
Outliers Detection
The Return.clean function within Performance Analytics package is able to clean return time series from outliers. Here below we compare the original time series with the outliers adjusted one.
dj_vol_log_ratio_outliersadj <- Return.clean(dj_vol_log_ratio, "boudt")
p <- plot(dj_vol_log_ratio)
p <- addSeries(dj_vol_log_ratio_outliersadj, col = 'blue', on = 1)
p
The prosecution of the analysis will be carried on with the original time series as a more conservative approach to volatility evaluation.
Correlation plots
Here below are the total and partial correlation plots.
acf(dj_vol_log_ratio)
pacf(dj_vol_log_ratio)
Above plots may suggest some ARMA(p,q) model with p and q > 0. That will be verified within the prosecution of the present analysis.
Unit root tests
We run the Augmented Dickey-Fuller test as available within the urca package. The no trend and no drift test flavor is run.
(urdftest_lag = floor(12* (nrow(dj_vol_log_ratio)/100)^0.25))
## [1] 28
summary(ur.df(dj_vol_log_ratio, type = "none", lags = urdftest_lag, selectlags="BIC"))
##
## ###############################################
## # Augmented Dickey-Fuller Test Unit Root Test #
## ###############################################
##
## Test regression none
##
##
## Call:
## lm(formula = z.diff ~ z.lag.1 - 1 + z.diff.lag)
##
## Residuals:
## Min 1Q Median 3Q Max
## -2.51742 -0.13544 -0.02485 0.12006 1.98641
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## z.lag.1 -5.21571 0.25218 -20.682 < 2e-16 ***
## z.diff.lag1 3.59929 0.24637 14.609 < 2e-16 ***
## z.diff.lag2 3.14516 0.23796 13.217 < 2e-16 ***
## z.diff.lag3 2.78234 0.22809 12.198 < 2e-16 ***
## z.diff.lag4 2.41745 0.21700 11.140 < 2e-16 ***
## z.diff.lag5 2.14179 0.20472 10.462 < 2e-16 ***
## z.diff.lag6 1.88354 0.19168 9.826 < 2e-16 ***
## z.diff.lag7 1.66394 0.17781 9.358 < 2e-16 ***
## z.diff.lag8 1.41657 0.16304 8.688 < 2e-16 ***
## z.diff.lag9 1.19354 0.14771 8.080 9.31e-16 ***
## z.diff.lag10 1.00374 0.13207 7.600 3.95e-14 ***
## z.diff.lag11 0.88134 0.11634 7.575 4.75e-14 ***
## z.diff.lag12 0.74946 0.10027 7.474 1.02e-13 ***
## z.diff.lag13 0.60798 0.08399 7.239 5.73e-13 ***
## z.diff.lag14 0.48367 0.06732 7.185 8.46e-13 ***
## z.diff.lag15 0.36278 0.05108 7.102 1.53e-12 ***
## z.diff.lag16 0.21128 0.03470 6.088 1.29e-09 ***
## z.diff.lag17 0.07911 0.01833 4.315 1.65e-05 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 0.2576 on 2972 degrees of freedom
## Multiple R-squared: 0.7476, Adjusted R-squared: 0.746
## F-statistic: 488.9 on 18 and 2972 DF, p-value: < 2.2e-16
##
##
## Value of test-statistic is: -20.6822
##
## Critical values for test statistics:
## 1pct 5pct 10pct
## tau1 -2.58 -1.95 -1.62
Based on reported test statistics compared with critical values, we reject the null hypothesis of unit root presence. See ref. [6] for further details about the Augmented Dickey-Fuller test.
ARMA model
We now determine the ARMA structure of our time series in order to run the ARCH effects test on the resulting residuals. That is in agreement with what outlined in ref. [4] $4.3.
We take advantage of auto.arima() function within forecast package (ref. [7]) to have an idea of what ARMA model to start with.
(auto_model_1 <- auto.arima(dj_vol_log_ratio, stepwise=FALSE))
## Series: dj_vol_log_ratio
## ARIMA(3,0,2) with zero mean
##
## Coefficients:
## ar1 ar2 ar3 ma1 ma2
## -0.5266 0.3769 0.1459 -0.0896 -0.7812
## s.e. 0.0970 0.0401 0.0203 0.0968 0.0872
##
## sigma^2 estimated as 0.0659: log likelihood=-176.48
## AIC=364.96 AICc=364.99 BIC=401.03
coeftest(auto_model_1)
##
## z test of coefficients:
##
## Estimate Std. Error z value Pr(>|z|)
## ar1 -0.526617 0.096988 -5.4297 5.644e-08 ***
## ar2 0.376906 0.040090 9.4014 < 2.2e-16 ***
## ar3 0.145902 0.020324 7.1787 7.039e-13 ***
## ma1 -0.089611 0.096788 -0.9258 0.3545
## ma2 -0.781163 0.087159 -8.9626 < 2.2e-16 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
The ma1 coefficient is not significative. Hence we try with the following ARMA(2,3) model.
(arma_model_2 <- arima(dj_vol_log_ratio, order=c(2,0,3), include.mean = FALSE))
##
## Call:
## arima(x = dj_vol_log_ratio, order = c(2, 0, 3), include.mean = FALSE)
##
## Coefficients:
## ar1 ar2 ma1 ma2 ma3
## -0.1802 0.6441 -0.4351 -0.8604 0.3596
## s.e. 0.0643 0.0454 0.0681 0.0493 0.0423
##
## sigma^2 estimated as 0.0658: log likelihood = -176.9, aic = 363.79
coeftest(arma_model_2)
##
## z test of coefficients:
##
## Estimate Std. Error z value Pr(>|z|)
## ar1 -0.180233 0.064315 -2.8023 0.005073 **
## ar2 0.644104 0.045449 14.1721 < 2.2e-16 ***
## ma1 -0.435122 0.068126 -6.3870 1.692e-10 ***
## ma2 -0.860443 0.049282 -17.4595 < 2.2e-16 ***
## ma3 0.359564 0.042307 8.4990 < 2.2e-16 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
All coefficients are significative and AIC is lower than the one of the first model. We then try with ARMA(1,2).
(arma_model_3 <- arima(dj_vol_log_ratio, order=c(1,0,2), include.mean = FALSE))
##
## Call:
## arima(x = dj_vol_log_ratio, order = c(1, 0, 2), include.mean = FALSE)
##
## Coefficients:
## ar1 ma1 ma2
## 0.6956 -1.3183 0.3550
## s.e. 0.0439 0.0518 0.0453
##
## sigma^2 estimated as 0.06598: log likelihood = -180.92, aic = 367.84
coeftest(arma_model_3)
##
## z test of coefficients:
##
## Estimate Std. Error z value Pr(>|z|)
## ar1 0.695565 0.043874 15.8537 < 2.2e-16 ***
## ma1 -1.318284 0.051787 -25.4557 < 2.2e-16 ***
## ma2 0.355015 0.045277 7.8409 4.474e-15 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
This model has the highest AIC among the set and all coefficients are significative.
We can also have a try with the eacf() function within the TSA package as further verification.
eacf(dj_vol_log_ratio)
## AR/MA
## 0 1 2 3 4 5 6 7 8 9 10 11 12 13
## 0 x o o x x o o x o o x o o o
## 1 x x o x o o o x o o x o o o
## 2 x x x x o o o o o o x o o o
## 3 x x x x o o o o o o x o o o
## 4 x x x x x o o o o o x o o o
## 5 x x x x o o o o o o o o o o
## 6 x x x x x o x o o o o o o o
## 7 x x x x x o o o o o o o o o
The upper left triangle with â€Å“O†as a vertex seems to be located somehow within {(1,2),(2,2),(1,3),(2,3)}, which represents the set of potential (p,q) values according to eacf function output. To remark that we prefer to consider parsimoniuos models, that is why we do not go too far as AR and MA orders.
We already verified ARMA models with (p,q) orders within the set {(3,2)(2,3)(1,2)}. Let us try {(2,2)(1,3)}
(arma_model_4 <- arima(dj_vol_log_ratio, order=c(2,0,2), include.mean = FALSE))
##
## Call:
## arima(x = dj_vol_log_ratio, order = c(2, 0, 2), include.mean = FALSE)
##
## Coefficients:
## ar1 ar2 ma1 ma2
## 0.7174 -0.0096 -1.3395 0.3746
## s.e. 0.1374 0.0560 0.1361 0.1247
##
## sigma^2 estimated as 0.06598: log likelihood = -180.9, aic = 369.8
coeftest(arma_model_4)
##
## z test of coefficients:
##
## Estimate Std. Error z value Pr(>|z|)
## ar1 0.7173631 0.1374135 5.2205 1.785e-07 ***
## ar2 -0.0096263 0.0560077 -0.1719 0.863536
## ma1 -1.3394720 0.1361208 -9.8403 < 2.2e-16 ***
## ma2 0.3746317 0.1247117 3.0040 0.002665 **
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
The ar2 coefficient is not significative.
(arma_model_5 <- arima(dj_vol_log_ratio, order=c(1,0,3), include.mean = FALSE))
##
## Call:
## arima(x = dj_vol_log_ratio, order = c(1, 0, 3), include.mean = FALSE)
##
## Coefficients:
## ar1 ma1 ma2 ma3
## 0.7031 -1.3253 0.3563 0.0047
## s.e. 0.0657 0.0684 0.0458 0.0281
##
## sigma^2 estimated as 0.06598: log likelihood = -180.9, aic = 369.8
coeftest(arma_model_5)
##
## z test of coefficients:
##
## Estimate Std. Error z value Pr(>|z|)
## ar1 0.7030934 0.0656902 10.7032 < 2.2e-16 ***
## ma1 -1.3253176 0.0683526 -19.3894 < 2.2e-16 ***
## ma2 0.3563425 0.0458436 7.7730 7.664e-15 ***
## ma3 0.0047019 0.0280798 0.1674 0.867
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
The ma3 coefficient is not significative.
As a conclusion, we keep both ARMA(1,2) and ARMA(2,3) as tentative mean models. We can now proceed on with ARCH effect tests.
ARCH effect Test
If ARCH effects are statistical significative for the residuals of our time series, a GARCH model is required.
We test the candidate mean model ARMA(2,3).
resid_dj_vol_log_ratio <- residuals(arma_model_2)
ArchTest(resid_dj_vol_log_ratio - mean(resid_dj_vol_log_ratio))
##
## ARCH LM-test; Null hypothesis: no ARCH effects
##
## data: resid_dj_vol_log_ratio - mean(resid_dj_vol_log_ratio)
## Chi-squared = 78.359, df = 12, p-value = 8.476e-12
Based on reported p-value, we reject the null hypothesis of no ARCH effects.
Let us have a look at the residual correlation plot too.
par(mfrow=c(1,2))
acf(resid_dj_vol_log_ratio)
pacf(resid_dj_vol_log_ratio)
We test the second candidate mean model, ARMA(1,2).
resid_dj_vol_log_ratio <- resid(arma_model_3)
ArchTest(resid_dj_vol_log_ratio - mean(resid_dj_vol_log_ratio))
##
## ARCH LM-test; Null hypothesis: no ARCH effects
##
## data: resid_dj_vol_log_ratio - mean(resid_dj_vol_log_ratio)
## Chi-squared = 74.768, df = 12, p-value = 4.065e-11
Based on reported p-value, we reject the null hypothesis of no ARCH effects.
Let us have a look at the residual correlation plots.
par(mfrow=c(1,2))
acf(resid_dj_vol_log_ratio)
pacf(resid_dj_vol_log_ratio)
To inspect asymmetries within the DJIA volume log-ratio, summary statistics and density plot are shown.
basicStats(dj_vol_log_ratio)
## DJI.Volume
## nobs 3019.000000
## NAs 0.000000
## Minimum -2.301514
## Maximum 2.441882
## 1. Quartile -0.137674
## 3. Quartile 0.136788
## Mean -0.000041
## Median -0.004158
## Sum -0.124733
## SE Mean 0.005530
## LCL Mean -0.010885
## UCL Mean 0.010802
## Variance 0.092337
## Stdev 0.303869
## Skewness -0.182683
## Kurtosis 9.463384
Negative skewness is reported.
plot(density(dj_vol_log_ratio))
As a result, also for daily trade volume log-ratio the eGARCH model will be proposed.
Saving the current enviroment for further analysis.
save.image(file='DowEnvironment.RData')
References
[1] Dow Jones Industrial Average [https://en.wikipedia.org/wiki/Dow_Jones_Industrial_Average]
[2] Skewness [https://en.wikipedia.org/wiki/Skewness]
[3] Kurtosis [https://en.wikipedia.org/wiki/Kurtosis]
[4] An introduction to analysis of financial data with R, Wiley, Ruey S. Tsay [https://www.wiley.com/en-us/An+Introduction+to+Analysis+of+Financial+Data+with+R-p-9780470890813]
[5] Time series analysis and its applications, Springer ed., R.H. Shumway, D.S. Stoffer [https://www.springer.com/gp/book/9783319524511]
[6] Applied Econometric Time Series, Wiley, W. Enders, 4th ed. [https://www.wiley.com/en-us/Applied+Econometric+Time+Series%2C+4th+Edition-p-9781118808566]
[7] Forecasting - Principle and Practice, Texts, R.J. Hyndman [https://otexts.org/fpp2/]
[8] Options, Futures and other Derivatives, Pearson ed., J.C. Hull[https://www.pearson.com/us/higher-education/product/Hull-Options-Futures-and-Other-Derivatives-9th-Edition/9780133456318.html]
[9] An introduction to rugarch package [https://cran.r-project.org/web/packages/rugarch/vignettes/Introduction_to_the_rugarch_package.pdf]
[10] Applied Econometrics with R, Achim Zeileis, Christian Kleiber - Springer Ed. [http://www.springer.com/la/book/9780387773162]
[11] GARCH modeling: diagnostic tests [https://logicalerrors.wordpress.com/2017/08/14/garch-modeling-conditional-variance-useful-diagnostic-tests/]
Disclaimer
Any securities or databases referred in this post are solely for illustration purposes, and under no regard should the findings presented here be interpreted as investment advice or a promotion of any particular security or source.
No comments:
Post a Comment
Note: Only a member of this blog may post a comment.