around-R: February 2017

Model Definition

Abstract

It is rather frequent to read about the analysis of log-returns historical time series ending up with the scenario where:

the historical log-return observations do not pass the normality tests, and, as a consequence, the null hypothesis of normal distribution is rejected
the simulations of potential future log-returns are performed on the basis of a normal distribution using as standard deviation parameter its estimate computed on available historical data

In this post and the following, I investigate the boundary conditions which allows to run simulations of normal distributed log returns related to non normal historical time series, with the purpose to investigate their future evolutions.

To start with, I will introduce some basic concepts as part of the fascinating field of Stochastic Calculus, as they can be found in ref. [1].

The hypothesis of log-returns normally distributed implies that the stochastic process associated to the share price $S(t)$ can be expressed as:

\[ \begin{equation} \begin{aligned} S(t)\ =\ S(0)\ e^{(u\ -\ \frac{\sigma^2}{2})\ +\ \sigma B(t)} \\ \end{aligned} \end{equation} \]

The origin of the term $(-\frac{\sigma^2}{2})$ can be found by taking a look at the stochastic exponential definition and its application to Brownian motion.

Let us recap some basic concepts in the next document section.

Background

I am going to illustrate what ref. [1] states in terms of stochastic exponential, finite quadratic variation and solution to the log-returns stochastic differential equation modeling the stock price.

Let $X$ have a stochastic differential, and $U$ satisfies:

\[ \begin{equation} \begin{cases} dU(t)\ =\ U(t)dX(t) \\ \\ U(0)\ =\ 1 \\ \end{cases} \end{equation} \]

Or equivalently:

\[ \begin{equation} \begin{aligned} U(t)\ =\ 1 + \int_{0}^{1}\ U(s)dX(s) \\ \end{aligned} \end{equation} \]

then $U$ is called the stochastic exponential of $X$, and is denoted by $\epsilon(X)$. If $[X,X]$ is of finite variation (hence $X(t)$ is of finite quadratic variation) then the solution to above stochastic exponential is given by:

\[ \begin{equation} \begin{aligned} U(t)\ =\ \Large \varepsilon \normalsize (X)\ =\ e^{X(t)\ -\ X(0)\ - \frac{1}{2}[X,X](t)\ } \\ \end{aligned} \end{equation} \]

In case $X(t)$ is Brownian motion with mean zero and variance $\sigma^2$, hence $X(t)\ =\ \sigma B$, we have:

\[ \begin{equation} \begin{aligned} \frac{1}{2}[X,X](t)\ = \frac{\sigma^2}{2} \\ \end{aligned} \end{equation} \]

Then, let $S(t)$ denotes the price of stock and assume that it has a stochastic differential. Return on stock $R(t)$ is defined by the relation:

\[ \begin{equation} \begin{aligned} dR(t)\ = \frac{dS(t)}{S(t)} \\ \end{aligned} \end{equation} \]

Thus

\[ \begin{equation} \begin{aligned} dS(t)\ = S(t)\ dR(t) \\ \end{aligned} \end{equation} \]

and the stock price is the stocastic exponential of the returns. If:

\[ \begin{equation} \begin{aligned} R(t)\ = \mu\ t\ +\ \sigma\ B(t) \\ \end{aligned} \end{equation} \]

the stock price is given by:

\[ \begin{equation} \begin{aligned} S(t)\ = S(0)\ \Large \varepsilon \normalsize(R(t))\ = S(0)\ e^{R(t)\ -\ R(0)\ - \frac{1}{2}[R,R](t)\ }\ =\ S(0)\ e^{(\mu\ - \frac{\sigma^2}{2})t\ +\ \sigma\ B(t)} \\ \end{aligned} \end{equation} \]

In other words:

\[ \begin{equation} \begin{aligned} ln(\frac{S(t)}{S(0)})\ =\ (\mu\ - \frac{\sigma^2}{2})t\ +\ \sigma\ B(t) \\ \end{aligned} \end{equation} \]

The left term is the log return starting from at time $t$ and that what I intend to simulate. The last formula arises the issue of determining quadratic variation of our non normal process or at least an upper bound of.

First of all, we have to demonstrate that it has finite quadratic variation. The definition of finite quadratic variation for Brownian motion $B(t)$ states:

\[ \begin{equation} \begin{aligned} \ [B, B](t)\ :=\ lim\ \Sigma _{i=1}^{n} |B(t_{i}^{n})\ -\ B(t_{i-1}^{n})|^2 \\ \end{aligned} \end{equation} \]

where for each n, $t_{i}^{n}\ (i=1..n)$ is a partition of the interval [0, t] and the limit is taken over all partitions with $\delta_{n}=max_{i}(t_{i+1}^{n}\ -\ t_{i}^{n}), \delta_{n} \Rightarrow\ 0$

It results that quadratic variation of the standard Brownian motion over [0, t] is equal to $t$. To demonstrate that, the following two basic steps are necessary:

\[ \begin{equation} \begin{cases} T_{n}(B)\ :=\ \Sigma _{i=1}^{n} |B(t_{i}^{n})\ -\ B(t_{i-1}^{n})|^2 \\ \\ E(T_{n}(B))\ =\ E(\Sigma _{i=1}^{n} |B(t_{i}^{n})\ -\ B(t_{i-1}^{n})|^2)\ =\ \Sigma _{i=1}^{n}\ (t_{i}^{n}\ -\ t_{i-1}^{n}) =\ t\\ \\ Var(T_{n}(B))\ =\ Var(\Sigma _{i=1}^{n} |B(t_{i}^{n})\ -\ B(t_{i-1}^{n})|^2)\ =\ \Sigma _{i=1}^{n} Var(|B(t_{i}^{n})\ -\ B(t_{i-1}^{n})|^2)\ <=\ 3t\delta_{n} \end{cases} \end{equation} \]

The third equation states that $Var(T_{n}(B))$ is limited. Hence it results:

\[ \begin{equation} \begin{aligned} E(\ \Sigma _{i=1}^{n}(T_{n}(B) - E(T_{n}(B)))^2) < \infty \\ \end{aligned} \end{equation} \]

That implies that the series inside the expectation converges almost surely. Hence its terms converges to zero, and thefore:

\[ \begin{equation} \begin{aligned} T_{n}(B) - E(T_{n}(B)) \Rightarrow \ 0\ \ \ almost\ surely \end{aligned} \end{equation} \]

Consequently:

\[ \begin{equation} \begin{aligned} T_{n}(B) \Rightarrow \ t \end{aligned} \end{equation} \]

In case of generalised Brownian motion, we have to consider $\sigma\ B$ in place of $B$. Let us indicate with $B_{\sigma}$ the generalised Brownian motion, we have:

\[ \begin{equation} \begin{aligned} E(T_{n}(B_{\sigma})) = \sigma^2 t \\ \\ Var(T_{n}(B_{\sigma}) <= 3\sigma^4 t\delta_{n} \\ \end{aligned} \end{equation} \]

Non-Normality and Stochastic Dominance

Up to now, I outlined what can be found in ref. [1]. In the prosecution of this post, I am introducing my own contribution.

Considering our stochastic process $Y(t)$ not normally distributed, I need to find sufficient conditions to have for $Y$ finite quadratic variation. At the purpose, I require:

\[ \begin{equation} \begin{cases} E(T_{n}(Y))\ :=\ E(\Sigma _{i=1}^{n} |Y(t_{i}^{n})\ -\ Y(t_{i-1}^{n})|^2)\ <=\ E(T_{n}(B_{\sigma}))\ :=\ E(\Sigma _{i=1}^{n} |B_{\sigma}(t_{i}^{n})\ -\ B_{\sigma}(t_{i-1}^{n})|^2)\ =\ \sigma^2 t \\ \\ Var(T_{n}(Y))\ :=\ Var(\Sigma _{i=1}^{n} |Y(t_{i}^{n})\ -\ Y(t_{i-1}^{n})|^2)\ <=\ Var(T_{n}(B_{\sigma}))\ :=\ Var(\Sigma _{i=1}^{n} | B_{\sigma}(t_{i}^{n})\ -\ B_{\sigma}(t_{i-1}^{n})|^2)\ <\ \infty \\ \end{cases} \end{equation} \]

Once defined the following random variables:

\[ \begin{equation} \begin{cases} T_{n}(Y)\ := \Sigma _{i=1}^{n} |Y(t_{i}^{n})\ -\ Y(t_{i-1}^{n})|^2 \\ \\ T_{n}(B_{\sigma})\ := \Sigma _{i=1}^{n} |B_{\sigma}(t_{i}^{n})\ -\ B_{\sigma}(t_{i-1}^{n})|^2 \\ \\ Q_{n}(Y)\ :=\ (\Sigma _{i=1}^{n} |Y(t_{i}^{n})\ -\ Y(t_{i-1}^{n})|^2)^2 \\ \\ Q_{n}(B_{\sigma})\ :=\ (\Sigma _{i=1}^{n} |B_{\sigma}(t_{i}^{n})\ -\ B_{\sigma}(t_{i-1}^{n})|^2)^2 \\ \end{cases} \end{equation} \]

First order stochastic dominance is a sufficient condition to have above two inequalities satisfied. That requires:

first order dominance of $T_{n}(B_{\sigma})$ with respect to $T_{n}(Y)$
first order dominance of $Q_{n}(B_{\sigma})$ with respect to $Q_{n}(Y)$

The first order dominance implies:

\[ \begin{equation} \begin{cases} E(T_{n}(Y))\ <=\ E(T_{n}(B_{\sigma})) \\ \\ E(Q_{n}(Y))\ <=\ E(Q_{n}(B_{\sigma})) \\ \end{cases} \end{equation} \]

Hence that implies $Var(Q_{n}(Y))\ < \infty$ since $Var(a) = E(a^2) - (E(a))^2$ and I have shown both terms as finite (think $T_{n}(Y)$ in place of $a$ symbol).

As before, the convergence of $T_{n}(Y) \rightarrow E(T_{n}(Y))$ almost sure is demonstrated and I can conclude for finite quadratic variation of our process $Y(t)$.

Having shown that, the following claims resume up my own contribution to this discussion.

Claim #1

Given a stochastic process $Y$ with zero mean and whose distribution is not normal, if exist a Brownian motion $B$ which satisfies:

first order dominance of $T_{n}(B_{\sigma})$ with respect to $T_{n}(Y)$
first order dominance of $Q_{n}(B_{\sigma})$ with respect to $Q_{n}(Y)$

where $T_n{}$ and $Q_n{}$ are defined as:

whereby for each n, $t_{i}^{n}\ (i=1..n)$ is a partition of the interval [0, t], then process $Y$ has finite quadratic variation with upper bound in the interval [0, t] equal to $\sigma^2 t$, for some $\sigma$ value to be determined.

Among all the $\sigma$ values satisfying this claim, we choose the minimum value of to be used for simulation.

Claim #2

A sufficient condition in order to satisfy claim #1 first order dominance hypothesis is the following. Considering the increments of Y and B processes, $\Delta Y$ and $\Delta B$, and their density distribution functions $f_{\Delta Y}(X)$ and $f_{\Delta B}(X)$, we have:

\[ \begin{equation} \begin{aligned} Prob(\Delta B^2 > C^2)\ = \int_{-\infty}^{-C}f_{\Delta B}(u)du + \int_{C}^{+\infty}f_{\Delta B}(u)du \\ Prob(\Delta Y^2 > C^2)\ = \int_{-\infty}^{-C}f_{\Delta Y}(u)du + \int_{C}^{+\infty}f_{\Delta Y}(u)du \end{aligned} \end{equation} \] Where $\Delta B$ has normal distribution and $\Delta Y$ has known empirical distribution. In order to have:

\[ \begin{equation} \begin{aligned} Prob(\Delta B^2 > C^2)\ > Prob(\Delta Y^2 > C^2) \\ \end{aligned} \end{equation} \] a sufficient condition is:

\[ \begin{equation} \begin{aligned} \exists\ \sigma_{0}\ >\ 0\ |\ f_{\Delta B_{\Large\sigma_{0}}}(X)\ >\ f_{\Delta Y}(X)\ \ \ \ \forall\ X \\ \end{aligned} \end{equation} \]

Rephrasing such statement, for any value of the random variable $X$, it can be found a brownian motion whose increment distribution density function stays above the distribution density function of the process $Y$ increments under analysis.

Claim #3

For a fractional ARMA process with parameter $d$ that satisfies $0 < d < 0.5$ (fat tail process) a Brownian motion process satisfying claim #1 first order dominance conditions does not exist.

That is a consequence of the fact that a fractional ARMA process with parameter $d$, $0 < d < 0.5$, has infinite quadratic variation (see ref. [4] $6.1.2).

Hence the inequalities of claim #1 cannot be satisfied, because Brownian motion has finite quadratic variation. In fact, that would be contradicted in case of claim #1 were true for $Y$ as fractional ARMA process with parameter $d$, $0 < d < 0.5$.

For a fractional ARMA process, claim #2 is false and two infinite and continuous set of values of X, (one at left tail side, the other at right one side), can be determined so that the inequality $f_{\Delta B_{\sigma}}(X)\ >=\ f_{\Delta Y}(X)$ is false.

Empirical Verification

In this section, I am going to run an empirical verification of the claim above based on the following steps:

compute log returns standard deviation confidence interval
generate $T_{n}(B_{\sigma})$, $Q_{n}(B_{\sigma})$, $T_{n}(Y)$, $Q_{n}(Y)$, where $B_{\sigma}$ is the general brownian random walk and Y is the empirical log return random walk, at given and shared $\sigma$ value
compare cumulative distributions of what obtained at step #2

When stochastic dominance occurs, the cumulative distribution curve (CDF) of $T_{n}(B_{\sigma})$ and $Q_{n}(B_{\sigma})$ are expected to stay below the ones of $T_{n}(Y)$ and $Q_{n}(Y)$ respectively, for each given X.

In order to generate random walks, we set as normal distribution generator configuration to model the increments of the general Brownian motion (ref. [5]):

\[ \begin{equation} \begin{cases} \Delta t\ = \Large \frac{T}{N} \\ \\ Norm(0,\Delta t)\ = Norm(0, \sigma^2) \\ \end{cases} \end{equation} \]

Hence $\sigma^2\ =\ \Delta t\ =\ \Large \frac{T}{N}$ with given T and N so that $\sigma$ is equal to a constant. I will then collect values of $T_{n}()$ and $Q_{n}()$ for n in the given range $[1..N]$, and compute corresponding empirical cumulative distribution of results. Having fixed $\sigma$ value while $n$ varies from 1 to $N$, the time horizon increases as $\Delta T\ =\ \sigma^2\ n$.

I will then collect values of $T_{n}()$ and $Q_{n}()$ ECDF for both normal and empirical processes and compare them to check for stochastic dominance.

That is different from limits in Background section equations where the time intervals are made smaller by increasing $n$, however it is consistent with what to be verified, which is the distribution of the $T_{n}()$ and $Q_{n}()$ stochastic processes while n varies.

Having said that, I am going to compute standard deviation confidence interval for our original observations.

load(file="structured-product-3.RData")
invisible(lapply(ts.package, function(x) {
  suppressPackageStartupMessages(library(x, character.only=TRUE)) }))

# setting seed, observations vector and its length
my_seed <- 23
set.seed(my_seed)
obs <- as.vector(GSPC_log_returns)
(n <- length(obs))

## [1] 640

# observations standard deviation and its confidence interval
sigma <- sd(obs)

# 95% confidence interval
sigma_low <- sqrt((n-1)*sigma^2/qchisq(.975, df=n-1))
sigma_up <- sqrt((n-1)*sigma^2/qchisq(.025, df=n-1))
confint_95 <- c(sigma_low, sigma, sigma_up)

# 99% confidence interval
sigma_low <- sqrt((n-1)*sigma^2/qchisq(.995, df=n-1))
sigma_up <- sqrt((n-1)*sigma^2/qchisq(.005, df=n-1))
confint_99 <- c(sigma_low, sigma, sigma_up)

type <- c("95%", "99%")
df_confint <- data.frame(rbind(confint_95, confint_99))
df_confint <- data.frame(cbind(type, df_confint))
colnames(df_confint) <- c("confint.percentage", "confint.low", "confint.mean", "confint.high")
kable(df_confint, caption = "standard deviation confindence interval")

standard deviation confindence interval
	confint.percentage	confint.low	confint.mean	confint.high
confint_95	95%	0.0055372	0.0058406	0.0061794
confint_99	99%	0.0054464	0.0058406	0.0062916

I am herein going to generate a set of values based on empirical distribution, as many as they are in order to compare their density distribution and associated walks.

# generating values based on empirical distribution
emp_value <- simulateVector(n, distribution="emp", seed = my_seed, param.list=list(obs=obs), sample.method='LHS')
acf(emp_value)

suppressWarnings(LjungBoxTest(emp_value, lag.max=10))

##   m   Qm    pvalue
##   1 0.01 0.9289345
##   2 0.70 0.7059472
##   3 1.82 0.6103120
##   4 3.39 0.4942991
##   5 3.68 0.5970665
##   6 3.77 0.7081791
##   7 4.05 0.7742745
##   8 5.39 0.7149767
##   9 5.41 0.7972082
##  10 5.42 0.8615639

Above ACF plot and LjungBox test show us that our empirical distribution based generation produces uncorrelated values, as required.

par(mfrow=c(2,2))

# walk based on original observations 
obs_walk <- diffinv(obs)

# walk based on values generated based on empirical distribution
emp_walk_value <- diffinv(emp_value)  

plot(density(obs), main = "Observations Distribution Density")
abline(v=0, col="red", lty="dotted")
plot(obs_walk, type='l', main = "Observations Walk")

plot(density(emp_value), main = "Empirical Process Distribution Density")
abline(v=0, col="red", lty="dotted")
plot(emp_walk_value, type='l', main = "Empirical Process Walk")

Based on plots above, we can observer how much the empirical distribution generated values resemble the original values.

Now, I generate 20000 samples based on the empirical distribution and on the normal one. I am going to use the sigma_up value as standard deviation to see if it is enough for stochastic dominance to be verified, however I am not stating that is the value to choose. A specific procedure will be shown at the purpose.

# number of generated values
n.gen <- 20000

# generating values based on empirical distribution
emp_value <- simulateVector(n.gen, distribution="emp", seed = my_seed, param.list=list(obs=obs), sample.method='LHS')

# generating normal distributed value
norm_value <- rnorm(n.gen, 0, sigma_up) # normal values

ACF plots and LjungBox test can confirm samples are uncorrelated (not shown herein).

par(mfrow=c(2,2))

# walk based on empirical distribution generated values 
emp_walk_value <- diffinv(emp_value) 
plot(density(emp_value), main = "Empirical Process Distribution Density")
abline(v=0, col="red", lty="dotted")
plot(emp_walk_value, type='l', main = "Empirical Process Walk")

# walk based on normal values
brown_walk_value <- diffinv(norm_value) 
plot(density(norm_value), main = "Normal Distribution Density")
abline(v=0, col="red", lty="dotted")
plot(brown_walk_value, type='l', main = "(Brownian) Random Walk")

There is a remarkable difference between the empirical distribution generated values walk and the brownian walk.

Based on empirical and brownian walks, I compute $T_{n}(Y)$, $T_{n}(B_{\sigma})$, $Q_{n}(Y)$, $Q_{n}(B_{\sigma})$ and show plots to compare them.

TnY <- diffinv(na.omit(diff(emp_walk_value))^2) # T_{n}(Y)
TnB <- diffinv(na.omit(diff(brown_walk_value))^2) # T_{n}(B(mu, sigma))
QnY <- TnY^2
QnB <- TnB^2

par(mfrow=c(1,2))

plot(ecdf(TnY), col= 'red', main = "Tn(Y) (red) and Tn(B) (blue) ECDF")
lines(ecdf(TnB), col='blue')

plot(ecdf(QnY), col= 'red', main = "Qn(Y) (red) and Qn(B) (blue) ECDF")
lines(ecdf(QnB), col='blue')

As we can see above, the blue ECDF curves (associated to $B_{\sigma}$) stays below the red ECDF ones (associated to $Y$) highlighting stochastic dominance of $B_{\sigma}$ on $Y$.

Comparing $T_{n}(.)$ and $Q_{n}(.)$ cumulative distributions to determine fractions of dominant values.

max_Tn <- max(c(TnY, TnB))
seq_v <- seq(0, max_Tn, length.out=n.gen)

fun.ecdf.TnY <- ecdf(TnY)
my.ecdf.TnY <- fun.ecdf.TnY(seq_v)
fun.ecdf.TnB <- ecdf(TnB)
my.ecdf.TnB <- fun.ecdf.TnB(seq_v)

max_Qn <- max(c(QnY, QnB))
seq_v <- seq(0, max_Qn, length.out = n.gen)

fun.ecdf.QnY <- ecdf(QnY)
my.ecdf.QnY <- fun.ecdf.QnY(seq_v)
fun.ecdf.QnB <- ecdf(QnB)
my.ecdf.QnB <- fun.ecdf.QnB(seq_v)

# dominance fractions
sum(my.ecdf.TnY > my.ecdf.TnB)/length(my.ecdf.TnB)

## [1] 0.99975

sum(my.ecdf.QnY > my.ecdf.QnB)/length(my.ecdf.QnB)

## [1] 0.9999

The last two numbers are respectively:

the fraction of $ECDF(T_{n}(B_{\sigma}))$ values greater than $ECDF(T_{n}(Y))$, in other words the fraction of $T_{n}(B_{\sigma})$ dominant values,
the fraction of $ECDF(Q_{n}(B_{\sigma}))$ values greater than $ECDF(Q_{n}(Y))$, in other words the fraction of $Q_{n}(B_{\sigma})$ dominant values,

For those, the following applies:

\[ \begin{equation} \begin{aligned} Prob(T_{n}(B_{\sigma}) > X)\ >\ Prob(T_{n}(Y) > X) \ \\ Prob(Q_{n}(B_{\sigma}) > X)\ >\ Prob(Q_{n}(Y) > X) \ \end{aligned} \end{equation} \]

As we can see, using $\sigma$ equal to the upper bound of its confidence interval allows to have a generous 99% of dominant values.

I then countercheck the $\Delta t$ used in Brownian motion generation by comparing T with $T_{n}(B_{\sigma})$ at time T.

# formulas above says that: sigma^2 = dt = T/n
(T <- sigma_up^2*n.gen) # T = sigma^2*n

## [1] 0.7916741

TnB[n.gen]              # = Sum{(T/n)} = n*(T/n) = T

## [1] 0.7867119

TnB[n.gen]*(T/n.gen)    # (sigma^2*n)*(dt) = sigma^2*T

## [1] 3.114097e-05

sigma_up^2*T

## [1] 3.13374e-05

There is basically equality in $T$ value and $T_{n}(B_{\sigma})$ time horizon end value, which proves consistency of simulation.

Now, doing the same comparison in case of fractional ARMA process taken as empirical process. I will consider two fractional ARMA processes, with $d$ parameter values close to the boundaries of the interval [0, 0.5].

emp_value_frac_1 <- fracdiff.sim(n.gen, d=0.05, sd=sigma_up)$series
emp_value_frac_2 <- fracdiff.sim(n.gen, d=0.45, sd=sigma_up)$series

emp_walk_value_frac_1 <- diffinv(emp_value_frac_1) # random walk based on empirical values 
emp_walk_value_frac_2 <- diffinv(emp_value_frac_2) # random walk based on empirical values 

TnY_frac_1 <- diffinv(na.omit(diff(emp_value_frac_1))^2) # T_{n}(Y)
TnY_frac_2 <- diffinv(na.omit(diff(emp_value_frac_2))^2) # T_{n}(Y)

QnY_frac_1 <- TnY_frac_1^2
QnY_frac_2 <- TnY_frac_2^2

par(mfrow=c(3,2))

plot(density(emp_walk_value_frac_1), main = "Fractional ARMA Distribution Density (d=0.05)")
plot(emp_walk_value_frac_1, type='l', main = "Fractional ARMA Walk (d=0.05)")

plot(density(emp_walk_value_frac_2), main = "Fractional ARMA Distribution Density (d=0.45)")
plot(emp_walk_value_frac_2, type='l', main = "Fractional ARMA Walk (d=0.45)")

plot(density(norm_value), main = "Normal Distribution Density")
plot(brown_walk_value, type='l', main = " (Brownian) Random Walk")

par(mfrow=c(1,2))

plot(ecdf(TnY_frac_1), col= 'red', main = "Tn(Y) (red) and Tn(B) (blue) ECDF")
lines(ecdf(TnY_frac_2), col= 'red', main = "Tn(Y) (red) and Tn(B) (blue) ECDF")
lines(ecdf(TnB), col='blue')

plot(ecdf(QnY_frac_1), col= 'red', main = "Tn(Y) (red) and Tn(B) (blue) ECDF")
lines(ecdf(QnY_frac_2), col= 'red', main = "Tn(Y) (red) and Tn(B) (blue) ECDF")
lines(ecdf(QnB), col='blue')

max_Tn_frac_1 <- max(c(TnY_frac_1, TnB))
seq_v <- seq(0, max_Tn_frac_1, length.out=n.gen)
fun.ecdf.TnY_frac_1 <- ecdf(TnY_frac_1)
my.ecdf.TnY_frac_1 <- fun.ecdf.TnY_frac_1(seq_v)

max_Tn_frac_2 <- max(c(TnY_frac_2, TnB))
seq_v <- seq(0, max_Tn_frac_2, length.out=n.gen)
fun.ecdf.TnY_frac_2 <- ecdf(TnY_frac_2)
my.ecdf.TnY_frac_2 <- fun.ecdf.TnY_frac_2(seq_v)

max_Qn_frac_1 <- max(c(QnY_frac_1, QnB))
seq_v <- seq(0, max_Qn_frac_1, length.out = n.gen)
fun.ecdf.QnY_frac_1 <- ecdf(QnY_frac_1)
my.ecdf.QnY_frac_1 <- fun.ecdf.QnY_frac_1(seq_v)

max_Qn_frac_2 <- max(c(QnY_frac_2, QnB))
seq_v <- seq(0, max_Qn_frac_2, length.out = n.gen)
fun.ecdf.QnY_frac_2 <- ecdf(QnY_frac_2)
my.ecdf.QnY_frac_2 <- fun.ecdf.QnY_frac_2(seq_v)

sum(my.ecdf.TnY_frac_1 > my.ecdf.TnB)/length(my.ecdf.TnB)

## [1] 0.12185

sum(my.ecdf.QnY_frac_1 > my.ecdf.QnB)/length(my.ecdf.QnB)

## [1] 0.18625

sum(my.ecdf.TnY_frac_2 > my.ecdf.TnB)/length(my.ecdf.TnB)

## [1] 0.12225

sum(my.ecdf.QnY_frac_2 > my.ecdf.QnB)/length(my.ecdf.QnB)

## [1] 0.199

Now from plots it can be observed that the blue ECDF curves (associated to B) stays above the red ECDF ones (associated to Y) highlighting there is not stochastic dominance of B on Y in this case.

Furher, the fractions of cumulative distributions dominant values are very small.

As a conclusion, stochastic dominance of $T_{n}(B_{\sigma})$ with respect $T_{n}(Y)$ and $Q_{n}(B_{\sigma})$ with respect $Q_{n}(Y)$ is not verified in the case of $Y$ is fractional ARMA process

Now, I am going to determine the minimum value $\sigma$ so that top have 99% of dominance of $T_{n}(B_{\sigma})$ and $Q_{n}(B_{\sigma})$ against $T_{n}(Y)$ and $Q_{n}(Y)$, where Y is our log returns observations as empirical process.

dominant_values_compute <- function(sigma_value) {
  norm_value <- rnorm(n.gen, 0, sigma_value) # normal values
  brown_walk_value <- diffinv(norm_value) # random walk based on normal values

  TnB <- diffinv(na.omit(diff(brown_walk_value))^2) # T_{n}(B(mu, sigma))
  QnB <- TnB^2
  
  max_Tn <- max(c(TnY, TnB))
  seq_v <- seq(0, max_Tn, length.out=n.gen)
  fun.ecdf.TnY <- ecdf(TnY)
  my.ecdf.TnY <- fun.ecdf.TnY(seq_v)
  fun.ecdf.TnB <- ecdf(TnB)
  my.ecdf.TnB <- fun.ecdf.TnB(seq_v)

  max_Qn <- max(c(QnY, QnB))
  seq_v <- seq(0, max_Qn, length.out = n.gen)
  fun.ecdf.QnY <- ecdf(QnY)
  my.ecdf.QnY <- fun.ecdf.QnY(seq_v)
  fun.ecdf.QnB <- ecdf(QnB)
  my.ecdf.QnB <- fun.ecdf.QnB(seq_v)

  # dominant values fractions
  c("TnY_gt_TnB" = sum(my.ecdf.TnY > my.ecdf.TnB)/length(my.ecdf.TnB),
    "QnY_gt_QnB" = sum(my.ecdf.QnY > my.ecdf.QnB)/length(my.ecdf.QnB))
}

# defining sequence of sigma values
seq_sigma <- seq(sigma_low, sigma_up, length.out=20)
seq_sigma <- c(sigma, seq_sigma)
seq_sigma <- sort(seq_sigma)

# running n.comp computations and averaging results
results <- matrix(0, nrow=length(seq_sigma), ncol=2)
n.comp <- 100
for (i in 1:n.comp) {
    results <- results + t(sapply(seq_sigma, function(x) { dominant_values_compute(x) }))
}
results <- results/n.comp

df_dominance <- data.frame(sigma = seq_sigma, results = results)
colnames(df_dominance) <- c("sigma", "TnY_gt_TnB", "QnY_gt_QnB")
kable(df_dominance)

sigma	TnY_gt_TnB	QnY_gt_QnB
0.0054464	0.0109870	0.0001580
0.0054909	0.0109845	0.0001585
0.0055354	0.0117805	0.0001855
0.0055799	0.0130495	0.0002420
0.0056243	0.0145020	0.0002715
0.0056688	0.0169155	0.0004120
0.0057133	0.0194025	0.0005245
0.0057578	0.0241840	0.0016125
0.0058023	0.0323905	0.0079980
0.0058406	0.0727990	0.0680455
0.0058467	0.0925355	0.0994210
0.0058912	0.3001760	0.3912190
0.0059357	0.5585580	0.7252215
0.0059802	0.6801335	0.8314655
0.0060247	0.7706020	0.8915670
0.0060692	0.8614080	0.9415935
0.0061136	0.9321955	0.9760040
0.0061581	0.9617390	0.9882100
0.0062026	0.9854285	0.9962040
0.0062471	0.9934085	0.9984595
0.0062916	0.9954160	0.9991345

The last table of values reports for each row:

the $\sigma$ value used for Brownian motion generation
the fraction of $Tn(Y)$ ECDF values greater than $Tn(B_{\sigma})$ ECDF ones at given X, in other words, the fraction of X values where:

\[ \begin{equation} \begin{aligned} Prob(T_{n}(B_{\sigma}) > X)\ >\ Prob(T_{n}(Y) > X) \end{aligned} \end{equation} \]

the fraction of $Qn(Y)$ ECDF values greater than $Qn(B_{\sigma})$ ECDF ones at given X, in other words, the fraction of X values where:

\[ \begin{equation} \begin{aligned} Prob(Q_{n}(B_{\sigma}) > X)\ >\ Prob(Q_{n}(Y) > X) \end{aligned} \end{equation} \]

To have at least 99% dominance, it is necessary to use $\sigma$ values greater than:

\[ \begin{equation} \begin{aligned} \sigma\ >=\ 0.0062471 \\ \end{aligned} \end{equation} \]

Please note that such interval does not include $\sigma$ = 0.0058406 as computed by the sd() function run on empirical observations.

Last, I show $\sigma^2$ values as computed based on

the empirical estimated value $\sigma$
the stochastic dominance $\sigma$ values interval

We have:

\[ \begin{equation} \begin{aligned} \sigma^2 = 0.0058406^2 = 3.4112271\times 10^{-5} \\ \end{aligned} \end{equation} \]

to be compared with the 99% dominance lower bound:

\[ \begin{equation} \begin{aligned} \sigma^2\ >=\ 3.9025971\times 10^{-5} \\ \end{aligned} \end{equation} \]

Conclusions

Log returns spatial distributions are frequently found as non normal, sometimes with considerable peakness and skewness. Normal distribution generators are typically used for Montecarlo simulations of future prices.

Further, the standard deviation decreases the log returns average value with the term $(-\frac{\sigma^2}{2} t)$ related to the general Brownian process quadratic variation.

In this post, I have shown the sufficient conditions to have non normal distributions finite quadratic variation. My claims have been verified empirically. An upper bound of empirical distribution quadratic variation has been computed.

The $\sigma$ value to be used in determining the observations empirical process quadratic variation upper bound has been computed with a 99% stochastic dominance confidence.

Updating the saved environment data with the sigma dominance value.

(sigma_dom <- df_dominance[20,1])

## [1] 0.006247077

save(ts.package, GSPC_log_returns, GSPC_AdjClose, outliers_l, outliers_r, sigma_dom, file="structured-product-4.RData")

around-R

Friday, February 24, 2017

Financial products with capital protection barrier - part 10

Model Definition

Abstract

Background

Non-Normality and Stochastic Dominance

Claim #1

Claim #2

Claim #3

Empirical Verification

Conclusions

References

Featured Post

Plant Leaf Classification - Part 3

Total Pageviews