Sunday, April 17, 2016

TCP reloaded (part 2)

Exploratory Analysis - step #1

Abstract

In this post I am going to start the exploratory analysis on the data collected by the simulation outlined in my previous post. Specifically I will highlight what are the different phases the TCP variables are evolving. This distinction will allow to better put in evidence the specific statistics dispersion of our collected variables, as a specific context can be so associated.

Analysis

As a first action, I load back the data generated by the TCP simulation.

suppressPackageStartupMessages(library(ggplot2))
suppressPackageStartupMessages(library(Rmisc))
suppressPackageStartupMessages(library(knitr))

df <- read.csv("TCP.csv", header=TRUE, stringsAsFactors = FALSE, sep=",")
dim(df)
## [1] 2501    7
kable(head(df,5))
time W R q x p S
0.00 2 0.2000000 0.0 0.0000000 0 10.000000
0.02 2 0.2066667 0.2 0.0000000 0 9.677419
0.04 2 0.2000000 0.0 0.0015038 0 10.000000
0.06 2 0.2066667 0.2 0.0014925 0 9.677419
0.08 2 0.2000000 0.0 0.0029851 0 10.000000
kable(tail(df,5))
time W R q x p S
2497 49.92 27.41972 0.9893418 23.68025 21.86931 0.0463816 27.71512
2498 49.94 27.25502 0.9878185 23.63456 21.88293 0.0467328 27.59112
2499 49.96 27.08724 0.9862126 23.58638 21.89610 0.0470732 27.46593
2500 49.98 26.91647 0.9845232 23.53570 21.90881 0.0474025 27.33960
2501 50.00 26.74279 0.9827496 23.48249 21.92104 0.0477203 27.21221

Let us plot again the time evolution of collected variables W, R, q, p, x, S.

p1 <- ggplot(data=df, aes(x=time, y = W)) + geom_line() + ggtitle("TCP sender window size")
p2 <- ggplot(data=df, aes(x=time, y = R)) + geom_line() + ggtitle("TCP round-trip-time")
p3 <- ggplot(data=df, aes(x=time, y = S)) + geom_line() + ggtitle("TCP transmission rate")
p4 <- ggplot(data=df, aes(x=time, y = q)) + geom_line() + ggtitle("FIFO buffer queue length")
p5 <- ggplot(data=df, aes(x=time, y = q)) + geom_line() + ggtitle("FIFO buffer queue EWMA")
p6 <- ggplot(data=df, aes(x=time, y = p)) + geom_line() + ggtitle("REM drop probability")
multiplot(p1, p2, p3, p4, p5, p6, cols=2)

I am going to identify the tick time where the slow start terminates and the congestion avoidance starts over.

The congestion avoidance timeline will be further split into two time windows:

  • the first one will cover from the slow start end to the first window multiplicative decrese event time

  • the second time window wil start from such latter event up to the end of the TCP simulation.

(S.max <- max(df$S))
## [1] 41.94982
(S.max.t <- which(df$S == S.max))
## [1] 39
tick1 <- S.max.t
status <- rep("slowstart", S.max.t)
S2 <- df$S[-c(1:S.max.t)]
S2.diff <- sign(diff(S2))
S2.diff2 <- diff(S2.diff)
(tick.min <- which(S2.diff2 == 2)[1])
## [1] 820
(tick2 <- tick1 + tick.min)
## [1] 859
status <- c(status, rep("ca_transient", tick.min))
status <- c(status, rep("ca_steadystate", nrow(df)-tick2))
df$status <- factor(status)

I plot again S, R, q, p time evolution highlighting the conditioning on the status variable as above computed.

p.col <- c("#FF6666", "#6644FF", "#22FF22")
ggplot(data=df, aes(x=time, y = S, color = status)) + geom_line() + scale_color_manual(values=p.col) + ggtitle("TCP transmission rate")

ggplot(data=df, aes(x=time, y = W, color = status)) + geom_line() + 
    scale_color_manual(values=p.col) +  ggtitle("TCP sender window size")

ggplot(data=df, aes(x=time, y = R, color = status)) + geom_line() + scale_color_manual(values=p.col) + ggtitle("TCP round-trip-time")

ggplot(data=df, aes(x=time, y = q, color = status)) + geom_line() + scale_color_manual(values=p.col)  + ggtitle("FIFO buffer queue length")

ggplot(data=df, aes(x=time, y = x, color = status)) + geom_line() + scale_color_manual(values=p.col)  + ggtitle("FIFO Buffer queue EWMA")

ggplot(data=df, aes(x=time, y = p, color = status)) + geom_line() + scale_color_manual(values=p.col) + ggtitle("RED drop probability")

From the plots above, it is well captured what is happening during the three tcp behavior phases, slow-start, congestion avoidance transient state and congestion avoidance steady-state.

For our simulation scenario, it is present a clear seasonal pattern of all the collected variables associated to the congestion avoidance steady state time window.

Additionally, I want to identify:

  • when the TCP transmission rate is increasing or decreasing

  • when the TCP transmission rate has low or high values.

S.diff <- diff(df$S)
S.diff.sign <- sign(S.diff)
S_slope <- ifelse(S.diff.sign > 0, "increase", "decrease")
df$S_slope <- c(S_slope, S_slope[length(S_slope)])
df$S_slope <- factor(df$S_slope)

S_slope_rate <- ifelse(abs(S.diff) > 0.01, "high", "low")
df$S_slope_rate <- c(S_slope_rate, S_slope_rate[length(S_slope_rate)])
df$S_slope_rate <- factor(df$S_slope_rate)

Below plots to verify their correctness.

ggplot(data = df, aes(x = time, y = S, color = S_slope)) + geom_point() + ggtitle("TCP transmission rate")

ggplot(data = df, aes(x = time, y = S, color = S_slope_rate)) + geom_point() + ggtitle("TCP transmission rate")

As a result, the original TCP dataframe has being augmented with three new data columns whose values are categorical variables indicating what boundary conditions are in place.

In this way, data summaries can be more effective when conditioned on those new variables.

Finally, I save the augmented TCP dataframe in a new csv file to be used in next posts.

write.csv(df, file = "TCP_ea.csv", row.names = FALSE, sep = ",")

Conclusions

In this post I identified the different TCP transmission rate status and included such information inside the original output file as generated by the TCP model simulation. That has improved the interpretation of the simulated TCP variables as having associated a specific TCP status to.

In the following post, I will go on with the exploratory analysis showing basically summaries conditioned on TCP status, sending slope and values rate types.

No comments:

Post a Comment

Note: Only a member of this blog may post a comment.