around-R: TCP reloaded (part 2)

Exploratory Analysis - step #1

Abstract

In this post I am going to start the exploratory analysis on the data collected by the simulation outlined in my previous post. Specifically I will highlight what are the different phases the TCP variables are evolving. This distinction will allow to better put in evidence the specific statistics dispersion of our collected variables, as a specific context can be so associated.

Analysis

As a first action, I load back the data generated by the TCP simulation.

suppressPackageStartupMessages(library(ggplot2))
suppressPackageStartupMessages(library(Rmisc))
suppressPackageStartupMessages(library(knitr))

df <- read.csv("TCP.csv", header=TRUE, stringsAsFactors = FALSE, sep=",")
dim(df)

## [1] 2501    7

kable(head(df,5))

time	W	R	q	x	S
0.00	2	0.2000000	0.0	0.0000000	10.000000
0.02	2	0.2066667	0.2	0.0000000	9.677419
0.04	2	0.2000000	0.0	0.0015038	10.000000
0.06	2	0.2066667	0.2	0.0014925	9.677419
0.08	2	0.2000000	0.0	0.0029851	10.000000

kable(tail(df,5))

	time	W	R	q	x	p	S
2497	49.92	27.41972	0.9893418	23.68025	21.86931	0.0463816	27.71512
2498	49.94	27.25502	0.9878185	23.63456	21.88293	0.0467328	27.59112
2499	49.96	27.08724	0.9862126	23.58638	21.89610	0.0470732	27.46593
2500	49.98	26.91647	0.9845232	23.53570	21.90881	0.0474025	27.33960
2501	50.00	26.74279	0.9827496	23.48249	21.92104	0.0477203	27.21221

Let us plot again the time evolution of collected variables W, R, q, p, x, S.

p1 <- ggplot(data=df, aes(x=time, y = W)) + geom_line() + ggtitle("TCP sender window size")
p2 <- ggplot(data=df, aes(x=time, y = R)) + geom_line() + ggtitle("TCP round-trip-time")
p3 <- ggplot(data=df, aes(x=time, y = S)) + geom_line() + ggtitle("TCP transmission rate")
p4 <- ggplot(data=df, aes(x=time, y = q)) + geom_line() + ggtitle("FIFO buffer queue length")
p5 <- ggplot(data=df, aes(x=time, y = q)) + geom_line() + ggtitle("FIFO buffer queue EWMA")
p6 <- ggplot(data=df, aes(x=time, y = p)) + geom_line() + ggtitle("REM drop probability")
multiplot(p1, p2, p3, p4, p5, p6, cols=2)

I am going to identify the tick time where the slow start terminates and the congestion avoidance starts over.

The congestion avoidance timeline will be further split into two time windows:

the first one will cover from the slow start end to the first window multiplicative decrese event time
the second time window wil start from such latter event up to the end of the TCP simulation.

(S.max <- max(df$S))

## [1] 41.94982

(S.max.t <- which(df$S == S.max))

## [1] 39

tick1 <- S.max.t
status <- rep("slowstart", S.max.t)
S2 <- df$S[-c(1:S.max.t)]
S2.diff <- sign(diff(S2))
S2.diff2 <- diff(S2.diff)
(tick.min <- which(S2.diff2 == 2)[1])

## [1] 820

(tick2 <- tick1 + tick.min)

## [1] 859

status <- c(status, rep("ca_transient", tick.min))
status <- c(status, rep("ca_steadystate", nrow(df)-tick2))
df$status <- factor(status)

I plot again S, R, q, p time evolution highlighting the conditioning on the status variable as above computed.

p.col <- c("#FF6666", "#6644FF", "#22FF22")
ggplot(data=df, aes(x=time, y = S, color = status)) + geom_line() + scale_color_manual(values=p.col) + ggtitle("TCP transmission rate")

ggplot(data=df, aes(x=time, y = W, color = status)) + geom_line() + 
    scale_color_manual(values=p.col) +  ggtitle("TCP sender window size")

ggplot(data=df, aes(x=time, y = R, color = status)) + geom_line() + scale_color_manual(values=p.col) + ggtitle("TCP round-trip-time")

ggplot(data=df, aes(x=time, y = q, color = status)) + geom_line() + scale_color_manual(values=p.col)  + ggtitle("FIFO buffer queue length")

ggplot(data=df, aes(x=time, y = x, color = status)) + geom_line() + scale_color_manual(values=p.col)  + ggtitle("FIFO Buffer queue EWMA")

ggplot(data=df, aes(x=time, y = p, color = status)) + geom_line() + scale_color_manual(values=p.col) + ggtitle("RED drop probability")

From the plots above, it is well captured what is happening during the three tcp behavior phases, slow-start, congestion avoidance transient state and congestion avoidance steady-state.

For our simulation scenario, it is present a clear seasonal pattern of all the collected variables associated to the congestion avoidance steady state time window.

Additionally, I want to identify:

when the TCP transmission rate is increasing or decreasing
when the TCP transmission rate has low or high values.

S.diff <- diff(df$S)
S.diff.sign <- sign(S.diff)
S_slope <- ifelse(S.diff.sign > 0, "increase", "decrease")
df$S_slope <- c(S_slope, S_slope[length(S_slope)])
df$S_slope <- factor(df$S_slope)

S_slope_rate <- ifelse(abs(S.diff) > 0.01, "high", "low")
df$S_slope_rate <- c(S_slope_rate, S_slope_rate[length(S_slope_rate)])
df$S_slope_rate <- factor(df$S_slope_rate)

Below plots to verify their correctness.

ggplot(data = df, aes(x = time, y = S, color = S_slope)) + geom_point() + ggtitle("TCP transmission rate")

ggplot(data = df, aes(x = time, y = S, color = S_slope_rate)) + geom_point() + ggtitle("TCP transmission rate")

As a result, the original TCP dataframe has being augmented with three new data columns whose values are categorical variables indicating what boundary conditions are in place.

In this way, data summaries can be more effective when conditioned on those new variables.

Finally, I save the augmented TCP dataframe in a new csv file to be used in next posts.

write.csv(df, file = "TCP_ea.csv", row.names = FALSE, sep = ",")

Conclusions

In this post I identified the different TCP transmission rate status and included such information inside the original output file as generated by the TCP model simulation. That has improved the interpretation of the simulated TCP variables as having associated a specific TCP status to.

In the following post, I will go on with the exploratory analysis showing basically summaries conditioned on TCP status, sending slope and values rate types.

around-R

Sunday, April 17, 2016

TCP reloaded (part 2)

Exploratory Analysis - step #1

Abstract

Analysis

Conclusions

No comments:

Post a Comment

Featured Post

Plant Leaf Classification - Part 3

Total Pageviews