Sunday, May 29, 2016

TCP reloaded (part 5)

Working points analysis

Abstract

In this post I am going to analyse the {R, S} phase diagram in order to identify relevant working points. Among those, the following ones are of major interest:

  • the minimum round-trip-time (R) working point
  • the maximum transmission rate (S) working point

Furthermore, I will show some inefficiencies in the TCP sender process and analyse their major root causes. Ultimately, I will experiment some basic remedies and show how the behavior improved.

Analysis

I start as usual by loading the data produced by the TCP mathematical model simulation to work only with the records associated to the congestion avoidance steady state.

suppressPackageStartupMessages(library(ggplot2))
suppressPackageStartupMessages(library(Rmisc))
suppressPackageStartupMessages(library(knitr))
suppressPackageStartupMessages(library(scales))

df <- read.csv("TCP_ea.csv", header=TRUE, stringsAsFactors = TRUE, sep=",")
df.ss <- subset(df, status == "ca_steadystate")
kable(head(df.ss,5))
time W R q x p S status S_slope S_slope_rate
860 17.18 13.18289 0.6556589 13.66977 20.81397 0.0216649 20.10633 ca_steadystate increase low
861 17.20 13.05330 0.6490631 13.47189 20.76025 0.0203492 20.11099 ca_steadystate increase low
862 17.22 12.92675 0.6424704 13.27411 20.70545 0.0190063 20.12038 ca_steadystate increase high
863 17.24 12.80440 0.6358840 13.07652 20.64957 0.0176362 20.13638 ca_steadystate increase high
864 17.26 12.68509 0.6293083 12.87925 20.59263 0.0162393 20.15719 ca_steadystate increase high
kable(tail(df.ss,5))
time W R q x p S status S_slope S_slope_rate
2497 49.92 27.41972 0.9893418 23.68025 21.86931 0.0463816 27.71512 ca_steadystate decrease high
2498 49.94 27.25502 0.9878185 23.63456 21.88293 0.0467328 27.59112 ca_steadystate decrease high
2499 49.96 27.08724 0.9862126 23.58638 21.89610 0.0470732 27.46593 ca_steadystate decrease high
2500 49.98 26.91647 0.9845232 23.53570 21.90881 0.0474025 27.33960 ca_steadystate decrease high
2501 50.00 26.74279 0.9827496 23.48249 21.92104 0.0477203 27.21221 ca_steadystate decrease high

During the congestion avoidance steady state, the {R, S} phase diagram is traversed multiple times. Hence, it is of interest to determine the cycle time period. In order to do that, I indentify the events whereby probability p value changes from zero to a value strictly greater than zero. Those events occur periodically as the phase diagram is being traversed.

findcycle <- function(df) { 
  cycle <- list()
  for (i in 1:(nrow(df.ss)-1)) {
    if (df.ss[i,"p"] == 0 && df.ss[i+1,"p"] > 0) {
      cycle <- c(cycle, i)
    }
  }
  cycle <- unlist(cycle)
  cycle
}

(cycle <- findcycle(df.ss))
## [1]  693 1546
cycle.start <- cycle[1]
cycle.finish <- cycle[2]-1
cycle.interval <- seq(cycle.start, cycle.finish)

kable(df.ss[cycle.start,, drop=FALSE])
time W R q x p S status S_slope S_slope_rate
1552 31.02 29.85692 0.9630124 22.89037 20.02084 0 31.00367 ca_steadystate decrease low
kable(df.ss[cycle.finish,, drop=FALSE])
time W R q x p S status S_slope S_slope_rate
2404 48.06 29.82268 0.9618724 22.85617 19.9876 0 31.00482 ca_steadystate decrease low
df.cycle <- df.ss[cycle.interval,, drop=FALSE]

The minimum round trip time, \(R_{min}\), working point shall be labeled as A.

The maximum transmission speed, \(S_{max}\), shall be labeled as B.

Working point A is so determined:

R.min <- min(df.cycle$R)
df.R.min <- which(df.cycle$R == R.min)
S.A <- df.cycle[df.R.min, "S"]

kable(df.cycle[df.R.min,, drop=FALSE])
time W R q x p S status S_slope S_slope_rate
1784 35.66 12.08607 0.4022561 6.067684 15.89964 0 30.04571 ca_steadystate increase high

Working point B is so determined:

S.max <- max(df.cycle$S)
df.S.max <- which(df.cycle$S == S.max)
R.B <- df.cycle[df.S.max, "R"]

kable(df.cycle[df.S.max,, drop=FALSE])
time W R q x p S status S_slope S_slope_rate
1842 36.82 14.64805 0.4587836 7.763509 12.67531 0 31.92801 ca_steadystate decrease low

Now, I am going to plot again the phase diagram {R, S} while adding optimal working points annotations. The path from A to B is shown with blue color, the points A and B are marked with blue circles and the direction highlighted by a closed triangle shaped arrow.

p.col <- c("darkgreen", "yellow", "orange", "red")
phase.diagram <- ggplot(data = df, aes(x=R, y=S, color=p)) + 
  geom_path() +
  scale_colour_gradientn(colours = p.col) + 
  geom_point(aes(x = R.min, y = S.A), size = 3, color = 'blue') +
  geom_point(aes(x = R.B, y = S.max), size = 3, color = 'blue') +
  annotate("text", x = R.min-0.02, y = S.A, label = "A") +
  annotate("text", x = R.B, y = S.max+2, label = "B")

sub.df <- df.cycle[df.R.min:df.S.max,, drop=FALSE]
phase.diagram + geom_path(data = sub.df, aes(x = R, y = S), color = 'blue',  
                          arrow = arrow(length = unit(0.1, "inches"), type = "closed"))

Ultimately, working points A and B can be so represented:

\[ \begin{equation} \begin{cases} A := \{R_{A}, S_{A}\} := \{R_{min}, S_{A}\} = \{0.4022561, 30.0457085\} \\ \\ B := \{R_{B}, S_{B}\} := \{R_{B}, S_{max}\} = \{0.4587836, 31.9280144\} \\ \end{cases} \end{equation} \]

The working point A can be prefererred for latency sensitive applications, such as interactive video/voice. The working point B can be preferred for bulk data oriented applications, not sensitive to latency, such as FTP file transfers/downloads, which requires to minimize the transfer time by maximizing transmission rate.

I am going to plot herein below the variation percentage of W, R, S with time along the path from A to B. The left plot shows increment variation percentage. The right plot shows variation percentage with respect the values at working point A.

df.cycle <- transform(df.cycle, W_var_perc = 100*c(NA,diff(W))/W,
                             R_var_perc = 100*c(NA,diff(R))/R,
                             S_var_perc = 100*c(NA,diff(S))/S)

values <- df.cycle[df.R.min:df.S.max, c("time", "W_var_perc", "R_var_perc", "S_var_perc")]

p1 <- ggplot(data=values) + ylab("percentage variation") +    
  theme(legend.title = element_blank()) +
  geom_line(aes(x = time, y = W_var_perc, colour = "W_var_perc")) +
  geom_line(aes(x = time, y = R_var_perc, colour = "R_var_perc")) +
  geom_line(aes(x = time, y = S_var_perc, colour = "S_var_perc"))

W.A <- df.cycle[df.R.min, "W"]
R.A <- df.cycle[df.R.min, "R"]
S.A <- df.cycle[df.R.min, "S"]

df.cycle <- transform(df.cycle, W_var_perc = 100*(W-W.A)/W.A,
                                R_var_perc = 100*(R-R.A)/R.A,
                                S_var_perc = 100*(S-S.A)/S.A)
col.select <- setdiff(colnames(df.cycle), c("W_var_perc", "R_var_perc", "S_var_perc"))

values <- df.cycle[df.R.min:df.S.max, c("time", "W_var_perc", "R_var_perc", "S_var_perc")]

p2 <- ggplot(data=values) + ylab("percentage variation respect A") +    
  theme(legend.title = element_blank()) +
  geom_line(aes(x = time, y = W_var_perc, colour = "W_var_perc")) +
  geom_line(aes(x = time, y = R_var_perc, colour = "R_var_perc")) +
  geom_line(aes(x = time, y = S_var_perc, colour = "S_var_perc"))

multiplot(p1, p2, cols=2)

It is rather evident that the transmission rate increment percentage collapses for further increments of the transmit window size while surpassing working point B. At the same time, there is a remarkable round-trip-time increase. There are no advantages in increasing the window size beyond working point B.

Additional relevants working points are defined as:

  • maximum round trip time working point C

  • minimum transmission rate working point D

Both represents inefficient working points, which do not provide any added value.

R.max <- max(df.cycle$R)
df.R.max <- which(df.cycle$R == R.max)
S.C <- df.cycle[df.R.max, "S"]

kable(df.cycle[df.R.max, col.select])
time W R q x p S status S_slope S_slope_rate
1622 32.42 30.09978 1.004225 24.12674 21.49836 0.0369614 29.97315 ca_steadystate decrease high
S.min <- min(df.cycle$S)
df.S.min <- which(df.cycle$S == S.min)
R.D <- df.cycle[df.S.min, "R"]

kable(df.cycle[df.S.min, col.select])
time W R q x p S status S_slope S_slope_rate
1717 34.32 13.46009 0.6625266 13.8758 20.80477 0.0213949 20.31631 ca_steadystate increase low

Based on above computation, the working point C and D are so determined:

\[ \begin{equation} \begin{cases} C := \{R_{C}, S_{C}\} := \{R_{max}, S_{C}\} = \{1.0042246, 29.9731524\} \\ \\ D := \{R_{D}, S_{D}\} := \{R_{D}, S_{min}\} = \{0.6625266, 20.3163068\} \\ \end{cases} \end{equation} \]

and the {R, S} phase diagram plot results as:

p.col <- c("darkgreen", "yellow", "orange", "red")
phase.diagram <- ggplot(data = df, aes(x=R, y=S, color=p)) + 
  geom_path() +
  scale_colour_gradientn(colours = p.col) + 
  geom_point(aes(x = R.min, y = S.A), size = 3, color = 'blue') +
  geom_point(aes(x = R.B, y = S.max), size = 3, color = 'blue') +
  geom_point(aes(x = R.max, y = S.C), size = 3, color = 'red') +
  geom_point(aes(x = R.D, y = S.min), size = 3, color = 'green') +
  annotate("text", x = R.min-0.02, y = S.A, label = "A") +
  annotate("text", x = R.B, y = S.max+2, label = "B") +
  annotate("text", x = R.max+0.02, y = S.C, label = "C") +
  annotate("text", x = R.D, y = S.min-2, label = "D")

sub.df <- df.cycle[df.R.min:df.S.max,, drop=FALSE]
phase.diagram + geom_path(data = sub.df, aes(x = R, y = S), color = 'blue',  
                          arrow = arrow(length = unit(0.1, "inches"), type = "closed"))

summary(df.cycle[,c("W", "R", "q", "S")])
##        W               R                q                S        
##  Min.   :10.99   Min.   :0.4023   Min.   : 6.068   Min.   :20.32  
##  1st Qu.:17.14   1st Qu.:0.5797   1st Qu.:11.390   1st Qu.:30.98  
##  Median :22.74   Median :0.7564   Median :16.691   Median :31.15  
##  Mean   :22.05   Mean   :0.7350   Mean   :16.051   Mean   :30.00  
##  3rd Qu.:27.30   3rd Qu.:0.8996   3rd Qu.:20.988   3rd Qu.:31.45  
##  Max.   :30.86   Max.   :1.0042   Max.   :24.127   Max.   :31.93

Then I compute the congestion avoidance steady state cycle period length.

R.min.time <- df.cycle[df.R.min, c("time")]
S.max.time <- df.cycle[df.S.max, c("time")]

(time.A2B <- (S.max.time - R.min.time))
## [1] 1.16
time.cycle.start <- df.cycle[1, c("time")]
time.cycle.finish <- df.cycle[nrow(df.cycle), c("time")]
(cycle.period <- (time.cycle.finish - time.cycle.start))
## [1] 17.04

It is remarkable that the TCP working point sojourns in the path {A, B} only for 6.81% of the cycle time. That highlights how much inefficient is the overall packet transmission process. To further understand why it is so inefficient, three major considerations:

  1. the TCP sender window increases beyond the point B without providing with any improvements in transmission rate or round trip time

  2. the RED discarding threshold intervention comes in too late

  3. the RED discarding probability is triggered by moving average (variable \(x\)) of the instantaneous queue length (variable \(q\)), hence \(x\) reaches the discarding threshold with some delay with respect to \(q\), as highlighted by the following plot

df.qx <- df.ss[,c("time", "q", "x")]
ggplot(data = df.qx) + geom_line(aes(x = time, y = q, color="queue")) + 
  geom_line(aes(x = time, y=x, color="moving average")) +
  ggtitle("Queue length and moving average comparison")

About first issue, we aim to determine if increasing the TCP sender window size improves or not the TCP transmission rate, hence it is necessary to compare:

\[ \begin{equation} \begin{aligned} \dfrac{W_{0}}{R_{0}}\ \ \ \ with \ \ \ \ \dfrac{(W_{0} + \Delta W_{0})}{(R_{0}+\Delta R_{0})} \\ \end{aligned} \end{equation} \]

Considering:

\[ \begin{equation} \begin{cases} {R_{0}\ =\ D + \dfrac{Q_{0}}{C}} \\ \\ {\Delta R_{0} = \dfrac{\Delta W_{0}}{C}} \\ \end{cases} \end{equation} \]

and that fixed delay \(D\) can be thought as determined by a virtual amount of fixed queueing \(Q^{*}\), as such:

\[ \begin{equation} \begin{aligned} D = \dfrac{Q^{*}}{C} \\ \end{aligned} \end{equation} \]

after some algebra, it is found there is speed increase starting from \(W_{0}\) current window size, if:

\[ \begin{equation} \begin{aligned} W_{0}\ <\ (Q_{0}\ +\ Q^{*}) \\ \end{aligned} \end{equation} \]

otherwise the TCP window increase results with a decrease in its transmission rate.

Based on our simulation configuration data:

\[ \begin{equation} \begin{aligned} Q^{*} = D * C = 0.2 * 30 = 6 \\ \end{aligned} \end{equation} \]

If we want to have a RED discarding probability greater than zero beyond point B, we have to set

\[ \begin{equation} \begin{aligned} T_{min}\ = df.ss[df.S.max, "q"] \ = 13.2682353 \\ \end{aligned} \end{equation} \]

and have RED discarding policy being triggered by the instantaneous queue length in place of its moving average. That implies to have equation 11 (describing the RED discarding profile) work based upon instantaneous queue length, this way:

\[ \begin{equation} p(x) =\left\{ \begin{array}{@{}ll@{}} 0 & \text{if}\ q < T_{min} \\ 1 & \text{if}\ q > T_{max} \\ \dfrac{q-T_{min}}{T_{max}-T_{min}}& \text{if}\ T_{min}<= q <= T_{max} \end{array} \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ (eq.\ 11) \right. \end{equation} \]

About second issue, I re-run simulation with modified \(T_{min}\) equal to 13.2682353 and equation 11 updated as above. Code upates are herein shown.

param.init <- c(W0 = 2, # initial window size [packets],
                SST = 10, # slow start threshold
                C = 30, # link bandwidth [packets/s]
                D = 0.2, # delay [s]
                alpha = 0.0001, # EWMA constant
                beta = 0.5, # TCP window decrease factor
                delta = 0.000266, # sampling time [s]
                Tmin = 13.26, # 20, # RED minimum threshold
                Tmax = 60, # RED maximum threshold
                Qmax = 80 # maximum buffer size
                )

# equation 11
  y5 <- drop_p(y[t,"q"], parms["Tmin"], parms["Tmax"])

After re-running TCP model simulation and generating a second csv file named as TCP_2.csv, I re-run the first step of exploratory analysis which identifies slow-start, congestion avoidance transient and steady-state to produce a corresponding TCP_2_ea.csv file that I am going to analyse and identify as simulation #2 (while simulation #1 is the one introduced in this first post series).

df_2 <- read.csv("TCP_2_ea.csv", header = TRUE, stringsAsFactors = TRUE, sep=",")
df_2.ss <- subset(df_2, status == "ca_steadystate")

(cycle <- findcycle(df_2.ss))
## [1]  693 1546
cycle.start <- cycle[1]
cycle.finish <- cycle[2]-1
cycle.interval <- seq(cycle.start, cycle.finish)

kable(df_2.ss[cycle.start,, drop=FALSE])
time W R q x p S status S_slope S_slope_rate
1088 21.74 18.25944 0.5792239 11.37672 11.53596 0 31.52397 ca_steadystate increase low
kable(df_2.ss[cycle.finish,, drop=FALSE])
time W R q x p S status S_slope S_slope_rate
1940 38.78 17.88381 0.5678579 11.03574 11.577 0 31.49346 ca_steadystate increase low
df_2.cycle <- df_2.ss[cycle.interval,, drop=FALSE]

R_2.min <- min(df_2.cycle$R)
df_2.R.min <- which(df_2.cycle$R == R_2.min)
S_2.A <- df_2.cycle[df_2.R.min, "S"]

S_2.max <- max(df_2.cycle$S)
df_2.S.max <- which(df_2.cycle$S == S_2.max)
R_2.B <- df_2.cycle[df_2.S.max, "R"]

R_2.max <- max(df_2.cycle$R)
df_2.R.max <- which(df_2.cycle$R == R_2.max)
S_2.C <- df_2.cycle[df_2.R.max, "S"]

S_2.min <- min(df_2.cycle$S)
df_2.S.min <- which(df_2.cycle$S == S_2.min)
R_2.D <- df_2.cycle[df_2.S.min, "R"]

ggplot(data = df_2, aes(x=R, y=S, color=p)) + 
  geom_path() +
  scale_colour_gradientn(colours = p.col) + 
  geom_point(aes(x = R_2.min, y = S_2.A), size = 3, color = 'blue') +
  geom_point(aes(x = R_2.B, y = S_2.max), size = 3, color = 'blue') +
  geom_point(aes(x = R_2.max, y = S_2.C), size = 3, color = 'red') +
  geom_point(aes(x = R_2.D, y = S_2.min), size = 3, color = 'green') +
  annotate("text", x = R_2.min-0.02, y = S_2.A, label = "A") +
  annotate("text", x = R_2.B, y = S_2.max+2, label = "B") +
  annotate("text", x = R_2.max+0.02, y = S_2.C, label = "C") +
  annotate("text", x = R_2.D, y = S_2.min-2, label = "D")

(R_2.min.time <- df_2.cycle[df_2.R.min, c("time")])
## [1] 31.86
(S_2.max.time <- df_2.cycle[df_2.S.max, c("time")])
## [1] 21.98

Then I compute how long the working point sojourns between optimal working points A and B for this scenario.

# since R.2.min.time > S_2.max.time, B2A time computation is done
(time.B2A <- (R_2.min.time-S_2.max.time))
## [1] 9.88
time.cycle.start <- df_2.cycle[1, c("time")]
time.cycle.finish <- df_2.cycle[nrow(df_2.cycle), c("time")]
(cycle.period <- (time.cycle.finish - time.cycle.start))
## [1] 17.04

We can see now the TCP working point sojourns in the path {A, B} for 42% of the cycle time.

Again summaries and boxplots to compare both simulations results.

summary(df_2.cycle[,c("W", "R", "q", "S")])
##        W               R                q                S        
##  Min.   :14.83   Min.   :0.5276   Min.   : 9.829   Min.   :25.51  
##  1st Qu.:16.34   1st Qu.:0.5529   1st Qu.:10.588   1st Qu.:28.78  
##  Median :18.34   Median :0.6072   Median :12.215   Median :31.18  
##  Mean   :18.20   Mean   :0.6069   Mean   :12.208   Mean   :29.98  
##  3rd Qu.:20.06   3rd Qu.:0.6586   3rd Qu.:13.759   3rd Qu.:31.48  
##  Max.   :21.28   Max.   :0.6895   Max.   :14.684   Max.   :31.53
par(mfrow=c(2,2))
boxplot(df.cycle[,"W"], df_2.cycle[,"W"], names=c("sim#1", "sim#2"), col="lightgreen",
        main = "Transmission Window Size W")
boxplot(df.cycle[,"R"], df_2.cycle[,"R"], names=c("sim#1", "sim#2"), col="lightgreen",
        main = "Round Trip Time R")
boxplot(df.cycle[,"q"], df_2.cycle[,"q"], names=c("sim#1", "sim#2"), col="lightgreen",
        main = "Queue Length q")
boxplot(df.cycle[,"S"], df_2.cycle[,"S"], names=c("sim#1", "sim#2"), col="lightgreen",
        main = "Transmission Rate S")

We can observe how the queue length and round trip-time have been decreased. The minimum transmission rate value has been increased, even though the statistical dispersion of S value increased. The TCP sender window has been decreased and its statistical dispersion decreased.

Overall, both minimum threshold and instantaneous queue length changes have improved the overall process. There is still a delay the overtaking of point B is communicated back to the sender process due to the round-trip-time such notification takes to reach the source.

That can be further improved by having the TCP sender window size increments occur once per round-trip-time, as proposed for data center TCP (DCTCP).

Conclusions

Thanks to exploratory analysis taking advantage of phase diagrams, in particular, I was able to spot features and inefficiencies in the TCP sender process. Some of the remedies have been applied and the comparison with former performance has highlighted remarkable improvements. Sure what shown is just a basic scenario with respect the complexity of a more realistic one whereby multiple TCP connections share network resources in competitive fashion.