Exploratory Analysis - step #1
Abstract
In this post I am going to start the exploratory analysis on the data collected by the simulation outlined in my previous post. Specifically I will highlight what are the different phases the TCP variables are evolving. This distinction will allow to better put in evidence the specific statistics dispersion of our collected variables, as a specific context can be so associated.
Analysis
As a first action, I load back the data generated by the TCP simulation.
suppressPackageStartupMessages(library(ggplot2))
suppressPackageStartupMessages(library(Rmisc))
suppressPackageStartupMessages(library(knitr))
df <- read.csv("TCP.csv", header=TRUE, stringsAsFactors = FALSE, sep=",")
dim(df)
## [1] 2501 7
kable(head(df,5))
time | W | R | q | x | p | S |
---|---|---|---|---|---|---|
0.00 | 2 | 0.2000000 | 0.0 | 0.0000000 | 0 | 10.000000 |
0.02 | 2 | 0.2066667 | 0.2 | 0.0000000 | 0 | 9.677419 |
0.04 | 2 | 0.2000000 | 0.0 | 0.0015038 | 0 | 10.000000 |
0.06 | 2 | 0.2066667 | 0.2 | 0.0014925 | 0 | 9.677419 |
0.08 | 2 | 0.2000000 | 0.0 | 0.0029851 | 0 | 10.000000 |
kable(tail(df,5))
time | W | R | q | x | p | S | |
---|---|---|---|---|---|---|---|
2497 | 49.92 | 27.41972 | 0.9893418 | 23.68025 | 21.86931 | 0.0463816 | 27.71512 |
2498 | 49.94 | 27.25502 | 0.9878185 | 23.63456 | 21.88293 | 0.0467328 | 27.59112 |
2499 | 49.96 | 27.08724 | 0.9862126 | 23.58638 | 21.89610 | 0.0470732 | 27.46593 |
2500 | 49.98 | 26.91647 | 0.9845232 | 23.53570 | 21.90881 | 0.0474025 | 27.33960 |
2501 | 50.00 | 26.74279 | 0.9827496 | 23.48249 | 21.92104 | 0.0477203 | 27.21221 |
Let us plot again the time evolution of collected variables W, R, q, p, x, S.
p1 <- ggplot(data=df, aes(x=time, y = W)) + geom_line() + ggtitle("TCP sender window size")
p2 <- ggplot(data=df, aes(x=time, y = R)) + geom_line() + ggtitle("TCP round-trip-time")
p3 <- ggplot(data=df, aes(x=time, y = S)) + geom_line() + ggtitle("TCP transmission rate")
p4 <- ggplot(data=df, aes(x=time, y = q)) + geom_line() + ggtitle("FIFO buffer queue length")
p5 <- ggplot(data=df, aes(x=time, y = q)) + geom_line() + ggtitle("FIFO buffer queue EWMA")
p6 <- ggplot(data=df, aes(x=time, y = p)) + geom_line() + ggtitle("REM drop probability")
multiplot(p1, p2, p3, p4, p5, p6, cols=2)
I am going to identify the tick time where the slow start terminates and the congestion avoidance starts over.
The congestion avoidance timeline will be further split into two time windows:
the first one will cover from the slow start end to the first window multiplicative decrese event time
the second time window wil start from such latter event up to the end of the TCP simulation.
(S.max <- max(df$S))
## [1] 41.94982
(S.max.t <- which(df$S == S.max))
## [1] 39
tick1 <- S.max.t
status <- rep("slowstart", S.max.t)
S2 <- df$S[-c(1:S.max.t)]
S2.diff <- sign(diff(S2))
S2.diff2 <- diff(S2.diff)
(tick.min <- which(S2.diff2 == 2)[1])
## [1] 820
(tick2 <- tick1 + tick.min)
## [1] 859
status <- c(status, rep("ca_transient", tick.min))
status <- c(status, rep("ca_steadystate", nrow(df)-tick2))
df$status <- factor(status)
I plot again S, R, q, p time evolution highlighting the conditioning on the status variable as above computed.
p.col <- c("#FF6666", "#6644FF", "#22FF22")
ggplot(data=df, aes(x=time, y = S, color = status)) + geom_line() + scale_color_manual(values=p.col) + ggtitle("TCP transmission rate")
ggplot(data=df, aes(x=time, y = W, color = status)) + geom_line() +
scale_color_manual(values=p.col) + ggtitle("TCP sender window size")
ggplot(data=df, aes(x=time, y = R, color = status)) + geom_line() + scale_color_manual(values=p.col) + ggtitle("TCP round-trip-time")
ggplot(data=df, aes(x=time, y = q, color = status)) + geom_line() + scale_color_manual(values=p.col) + ggtitle("FIFO buffer queue length")
ggplot(data=df, aes(x=time, y = x, color = status)) + geom_line() + scale_color_manual(values=p.col) + ggtitle("FIFO Buffer queue EWMA")
ggplot(data=df, aes(x=time, y = p, color = status)) + geom_line() + scale_color_manual(values=p.col) + ggtitle("RED drop probability")
From the plots above, it is well captured what is happening during the three tcp behavior phases, slow-start, congestion avoidance transient state and congestion avoidance steady-state.
For our simulation scenario, it is present a clear seasonal pattern of all the collected variables associated to the congestion avoidance steady state time window.
Additionally, I want to identify:
when the TCP transmission rate is increasing or decreasing
when the TCP transmission rate has low or high values.
S.diff <- diff(df$S)
S.diff.sign <- sign(S.diff)
S_slope <- ifelse(S.diff.sign > 0, "increase", "decrease")
df$S_slope <- c(S_slope, S_slope[length(S_slope)])
df$S_slope <- factor(df$S_slope)
S_slope_rate <- ifelse(abs(S.diff) > 0.01, "high", "low")
df$S_slope_rate <- c(S_slope_rate, S_slope_rate[length(S_slope_rate)])
df$S_slope_rate <- factor(df$S_slope_rate)
Below plots to verify their correctness.
ggplot(data = df, aes(x = time, y = S, color = S_slope)) + geom_point() + ggtitle("TCP transmission rate")
ggplot(data = df, aes(x = time, y = S, color = S_slope_rate)) + geom_point() + ggtitle("TCP transmission rate")
As a result, the original TCP dataframe has being augmented with three new data columns whose values are categorical variables indicating what boundary conditions are in place.
In this way, data summaries can be more effective when conditioned on those new variables.
Finally, I save the augmented TCP dataframe in a new csv file to be used in next posts.
write.csv(df, file = "TCP_ea.csv", row.names = FALSE, sep = ",")
Conclusions
In this post I identified the different TCP transmission rate status and included such information inside the original output file as generated by the TCP model simulation. That has improved the interpretation of the simulated TCP variables as having associated a specific TCP status to.
In the following post, I will go on with the exploratory analysis showing basically summaries conditioned on TCP status, sending slope and values rate types.
No comments:
Post a Comment
Note: Only a member of this blog may post a comment.