around-R: 2016

Normality Tests

Abstract

In this post I will check the null hypothesis that our observations are normally distributed.

Analysis

I take advantage of the following tests for normal distribution hypothesis verification:

Jarque Bera test
Lillie test

Further, I will show QQ plot and compute metrics such as skewness and kurtosis.

load(file="structured-product-3.RData")
invisible(lapply(ts.package, function(x) {
  suppressPackageStartupMessages(library(x, character.only=TRUE)) }))
set.seed(1023)

qqnorm(GSPC_log_returns)
qqline(GSPC_log_returns)

Based on qqplot above, we can say that the empirical distribution displays tails do not match normal distribution.

jarque.bera.test(GSPC_log_returns)

## 
## Results of Hypothesis Test
## --------------------------
## 
## Alternative Hypothesis:          
## 
## Test Name:                       Jarque Bera Test
## 
## Data:                            GSPC_log_returns
## 
## Test Statistic:                  X-squared = 3964.679
## 
## Test Statistic Parameter:        df = 2
## 
## P-value:                         0

lillie.test(GSPC_log_returns)

## 
## Results of Hypothesis Test
## --------------------------
## 
## Alternative Hypothesis:          
## 
## Test Name:                       Lilliefors (Kolmogorov-Smirnov) normality test
## 
## Data:                            GSPC_log_returns
## 
## Test Statistic:                  D = 0.07677486
## 
## P-value:                         1.359722e-09

Both tests fail and we can rject the null hypothesis of normal distributed observations.

kurtosis(GSPC_log_returns)

## [1] 15.04795

skewness(GSPC_log_returns)

## [1] -0.9384068

High positive kurtosis, (lepto-kurtosis), indicates presence of long tails or presence of outliers.

Negative skewness indicates that the left tail is longer than the right one.

It is relevant to compute the standard deviation confidence interval for our log returns time series.

(n.obs <- length(GSPC_log_returns))

## [1] 640

(mean.obs <- mean(GSPC_log_returns))

## [1] 0.0003937656

(sigma <- sd(GSPC_log_returns))

## [1] 0.005840571

sigma_low <- sqrt((n.obs-1)*sigma^2/qchisq(.975, df=n.obs-1))
sigma_up <- sqrt((n.obs-1)*sigma^2/qchisq(.025, df=n.obs-1))
sigma.ci <- data.frame("sigma_low" = sigma_low, 
                       "sigma" = sigma,
                       "sigma_up" = sigma_up)

suppressPackageStartupMessages(library(knitr))
kable(sigma.ci, caption="Standard deviation 95% confidence interval")

Standard deviation 95% confidence interval
sigma_low	sigma	sigma_up
0.0055372	0.0058406	0.0061794

Also density plots of empirical distributions against normal distribution parametrized based on 95% confidence interval standard deviation values are of interest.

par(mfrow=c(1,2))

dnorm_1 <- function(x) {
  dnorm(x, mean.obs, sigma.ci$sigma)
}

dnorm_2 <- function(x) {
  dnorm(x, mean.obs, sigma.ci$sigma_low)
}

dnorm_3 <- function(x) {
  dnorm(x, mean.obs, sigma.ci$sigma_up)
}

plot(density(GSPC_log_returns), main = "Distribution density plot")
curve(dnorm_1, add = TRUE, col = 'blue')
curve(dnorm_2, add = TRUE, col = 'red')
curve(dnorm_3, add = TRUE, col = 'green')

plot(density(GSPC_log_returns), main = "Distribution density plot")
lines(density(rnorm(n.obs, mean.obs, sigma.ci$sigma)), col='blue')
lines(density(rnorm(n.obs, mean.obs, sigma.ci$sigma_low)), col='red')
lines(density(rnorm(n.obs, mean.obs, sigma.ci$sigma_up)), col='green')

It is evident the leptokurtosis and higher peakness of the empirical distribution than the normal ones.

We may wonder if null hypothesis of normality is not rejected in case outliers are taken off.

GSPC_log_returns_no_outliers <- GSPC_log_returns[-c(outliers_l, outliers_r)]

qqnorm(GSPC_log_returns_no_outliers)
qqline(GSPC_log_returns_no_outliers)

jarque.bera.test(GSPC_log_returns_no_outliers)

## 
## Results of Hypothesis Test
## --------------------------
## 
## Alternative Hypothesis:          
## 
## Test Name:                       Jarque Bera Test
## 
## Data:                            GSPC_log_returns_no_outliers
## 
## Test Statistic:                  X-squared = 19.78535
## 
## Test Statistic Parameter:        df = 2
## 
## P-value:                         5.054349e-05

lillie.test(GSPC_log_returns_no_outliers)

## 
## Results of Hypothesis Test
## --------------------------
## 
## Alternative Hypothesis:          
## 
## Test Name:                       Lilliefors (Kolmogorov-Smirnov) normality test
## 
## Data:                            GSPC_log_returns_no_outliers
## 
## Test Statistic:                  D = 0.06213478
## 
## P-value:                         4.657062e-06

kurtosis(GSPC_log_returns_no_outliers)

## [1] 3.656702

skewness(GSPC_log_returns_no_outliers)

## [1] 0.2834059

As we can see above, we have still to reject the null hypothesis of normal distribution. However kurtosis and skewness are far improved without outliers.

par(mfrow=c(1,2))

plot(density(GSPC_log_returns_no_outliers), main = "Distribution density plot")
curve(dnorm_1, add = TRUE, col = 'blue')
curve(dnorm_2, add = TRUE, col = 'red')
curve(dnorm_3, add = TRUE, col = 'green')

plot(density(GSPC_log_returns_no_outliers), main = "Distribution density plot")
lines(density(rnorm(n.obs, mean.obs, sigma.ci$sigma)), col='blue')
lines(density(rnorm(n.obs, mean.obs, sigma.ci$sigma_low)), col='red')
lines(density(rnorm(n.obs, mean.obs, sigma.ci$sigma_up)), col='green')

Above plots show log returns without outliers observation density distribution against three normal distributions parametrized based on 95% confidence interval standard deviation values.

One can additional wonder if generating 640 samples by the rnorm() function would pass the normality test.

norm.samples <- rnorm(n.obs, mean.obs, sigma.ci$sigma)

qqnorm(norm.samples)
qqline(norm.samples)

jarque.bera.test(norm.samples)

## 
## Results of Hypothesis Test
## --------------------------
## 
## Alternative Hypothesis:          
## 
## Test Name:                       Jarque Bera Test
## 
## Data:                            norm.samples
## 
## Test Statistic:                  X-squared = 3.061583
## 
## Test Statistic Parameter:        df = 2
## 
## P-value:                         0.2163644

lillie.test(norm.samples)

## 
## Results of Hypothesis Test
## --------------------------
## 
## Alternative Hypothesis:          
## 
## Test Name:                       Lilliefors (Kolmogorov-Smirnov) normality test
## 
## Data:                            norm.samples
## 
## Test Statistic:                  D = 0.03175149
## 
## P-value:                         0.1214834

As we can see from results above, normality tests pass hence it is not the small number of samples to influence.

The next question is:

is our time series distribution changed over time ?

To help in the answer, the ecp package (ref. [1]) and its e.divisive() function to detect changes in distribution, distribution tail shape, mean, variance and correlations.

suppressPackageStartupMessages(library(ecp))
e_div_result = e.divisive(X = matrix(GSPC_log_returns), sig.lvl = 0.05, R = 199, k = NULL, min.size = 30, alpha = 1)
plot(GSPC_log_returns, type='l')
abline(v = e_div_result$estimates, col = 'red')

e_div_result$estimates

## [1]   1 641

e_div_result$p.values

## [1] 0.35

No changes of any types are detected as the estimates refer to the begin and the end of the time series.

Conclusions

The null hypothesis of normal distribution is rejected, with or without outliers observations.

That means that our observations empirical distribution cannot be well modeled by a normal distribution.

Such fact poses questions on simulation of future log-returns.

The good news is that our historical log returns do not show changes in distribution or changes in mean or variance parameters.

References

ecp package vignette

around-R

Sunday, November 20, 2016

Financial products with capital protection barrier - part 8

Normality Tests

Abstract

Analysis

Conclusions

References

Featured Post

Plant Leaf Classification - Part 3

Total Pageviews