Thursday, January 1, 2015

Temperature monitoring and time series analysis (part 2)

Exploratory analysis (cont.)

I herein go on with the exploratory analysis started in the previous post. First, I load saved environment data and the required packages.

load(file="nottingham.RData")
suppressPackageStartupMessages(library(xts))
suppressPackageStartupMessages(library(knitr))
suppressPackageStartupMessages(library(ggplot2))
suppressPackageStartupMessages(library(tidyr))
suppressPackageStartupMessages(library(dplyr))
suppressPackageStartupMessages(library(Rmisc))
suppressPackageStartupMessages(library(extremevalues))
suppressPackageStartupMessages(library(strucchange))

It is time to take advantage of the mmt_df table to compute some summary statistics as based on same year or same month observations. The apply() function second parameter indicates if third paramter function is applied to rows (1) or columns (2). If applied to rows, we compute summary statistics for same year observations across months. If applied to columns, we compute summary statistics for same months observations across years.

apply(mmt_df[,-1], 1, summary)
##          [,1]  [,2]  [,3]  [,4]  [,5]  [,6]  [,7]  [,8]  [,9] [,10] [,11]
## Min.    39.80 39.70 37.50 36.60 37.50 36.30 39.20 35.20 37.30 31.30 37.10
## 1st Qu. 42.38 43.85 41.15 41.38 42.53 40.38 42.95 41.58 42.37 41.68 41.50
## Median  48.60 50.55 44.60 47.50 47.65 47.55 47.80 48.70 48.75 46.55 48.90
## Mean    48.89 50.73 47.28 47.84 48.72 48.46 49.37 48.37 48.99 48.13 49.15
## 3rd Qu. 54.82 57.42 54.65 53.13 56.72 55.20 56.97 54.78 55.65 57.62 57.78
## Max.    58.50 66.30 57.80 64.20 60.80 63.50 62.50 60.50 62.20 62.50 61.60
##         [,12] [,13] [,14] [,15] [,16] [,17] [,18] [,19] [,20]
## Min.    37.10 38.40 35.60 38.20 36.40 35.00 37.10 39.20 37.80
## 1st Qu. 40.05 42.25 41.40 42.20 43.28 41.53 40.95 45.48 42.02
## Median  46.55 45.95 49.45 49.05 47.85 46.80 49.15 49.25 47.25
## Mean    48.13 49.01 50.18 50.32 49.89 48.60 49.10 50.28 49.39
## 3rd Qu. 54.90 56.40 60.28 59.30 57.72 58.22 56.87 57.50 58.05
## Max.    60.60 63.50 65.50 66.50 64.60 61.10 61.80 60.40 61.80
apply(mmt_df[,-1], 1, sd)
##  [1]  7.154205  8.764633  7.911916  8.699264  8.394276  9.604588  8.467836
##  [8]  8.474060  8.263442 10.398368  8.973547  8.543933  8.718680 10.891458
## [15]  9.495725  9.456355  9.305228  9.234717  7.482358  8.609451
apply(mmt_df[,-1], 2, summary)
##         January February March April   May  June  July August September
## Min.      34.80    31.30 38.30 42.10 49.20 52.70 56.80  54.30     53.00
## 1st Qu.   38.78    38.35 40.38 45.40 51.13 56.98 60.32  59.82     54.63
## Median    39.70    39.55 42.60 46.80 52.90 58.45 61.75  60.50     56.60
## Mean      39.71    39.19 42.20 46.29 52.56 58.04 61.90  60.52     56.48
## 3rd Qu.   41.00    40.92 44.10 47.15 53.87 59.10 63.68  61.80     57.65
## Max.      44.20    43.40 47.30 48.90 55.70 60.80 66.50  64.90     60.10
##         October November December
## Min.      46.60    36.60    35.20
## 1st Qu.   48.28    41.60    37.25
## Median    49.90    42.85    39.50
## Mean      49.50    42.60    39.52
## 3rd Qu.   50.55    43.75    41.72
## Max.      54.20    47.80    45.80
apply(mmt_df[,-1], 2, sd)
##   January  February     March     April       May      June      July 
##  2.287558  2.702221  2.555175  1.688007  1.675646  1.924742  2.636784 
##    August September   October  November  December 
##  2.461407  2.008371  1.905525  2.594219  2.894023

Boxplots and density plots are capable to graphycally gather temperature observations statistical dispersion.

ggplot(data=mmt_df_long_2, aes(x = Month, y=Temperature, fill=Month)) +
  geom_boxplot() + 
  theme(legend.position="none") +
  scale_x_discrete(limits=mmt_df_long_2$Month[1:12]) +
  theme(axis.text.x = element_text(face="bold", angle=90)) +
  ggtitle("Temperature across same month of the year")

ggplot(data=mmt_df_long_2, aes(x = Year, y=Temperature, fill=Year)) +
  geom_boxplot() + 
  theme(legend.position="none") +
  theme(axis.text.x = element_text(face="bold", angle=90)) +
  ggtitle("Temperature across years")

ggplot(data=mmt_df_long_2, aes(x = Temperature, fill = 1)) +
  geom_density() + 
  theme(legend.position="none") +
  ggtitle("Temperature distribution density")

Temperature distribution densities per each month based on yearly collected observations are herein shown.

ggplot(data=mmt_df_long_2, aes(x=Temperature, fill=Month)) +
  facet_grid(. ~ Month) +
  geom_density() + 
  coord_flip() +
  theme(legend.position="none") +
  theme(axis.text.x = element_text(face="bold", angle=90)) +
  ggtitle("Temperature distribution density for each month")

Aggregating the result per year allows to have a look at how the average temperature per year evolved.

yearly.mean <- data.frame(Years = years, Temperature = coredata(apply.yearly(mmt, mean)))
kable(yearly.mean)
Years Temperature
1920 48.89167
1921 50.73333
1922 47.27500
1923 47.84167
1924 48.72500
1925 48.45833
1926 49.36667
1927 48.36667
1928 48.99167
1929 48.13333
1930 49.15000
1931 48.13333
1932 49.00833
1933 50.17500
1934 50.31667
1935 49.89167
1936 48.60000
1937 49.10000
1938 50.27500
1939 49.39167
summary(yearly.mean)
##      Years       Temperature   
##  Min.   :1920   Min.   :47.27  
##  1st Qu.:1925   1st Qu.:48.44  
##  Median :1930   Median :49.00  
##  Mean   :1930   Mean   :49.04  
##  3rd Qu.:1934   3rd Qu.:49.52  
##  Max.   :1939   Max.   :50.73
p1 <- ggplot(data=yearly.mean, aes(x = factor(0), y=Temperature)) +
  geom_boxplot() + 
  theme(legend.position="none") +
  xlab("") +
  ggtitle("Yearly Mean Temperature")
  
p2 <- ggplot(data=yearly.mean, aes(x = Temperature, fill = 1)) +
  geom_density() + 
  theme(legend.position="none") +
  ggtitle("Yearly Mean Temperature Density")

multiplot(p1, p2, cols=2)

I am going to investigate for outliers by taking advantage of the extremevalue package and its getOutliersI() and outlierPlot() functions.

# left outliers quantile
left_thresh <- 0.01 

# right outliers quantile
right_thresh <- 0.99

# determining the outliers
mmt_outlier <- getOutliersI(mmt_df_long_2$Temperature, 
                             FLim = c(left_thresh, right_thresh), 
                             distribution = "normal")
mmt_outlier$iLeft
## integer(0)
mmt_outlier$iRight
## integer(0)
# outliers are plotted in red color
outlierPlot(mmt_df_long_2$Temperature, mmt_outlier, mode="qq")

As aboveshown, no outliers are present.

Further I investigate any structural change present by taking advantage of the strucchange package and its efp and Fstats functions.

Specifically, the efp function allows to investigate the empirical fluctuation process according to a specified method such as OLS residuals.

par(mfrow=(c(1,2)))
temperature.efp <- efp(mmt ~ 1, type = "OLS-CUSUM")
plot(temperature.efp, alpha = 0.05, alt.boundary = FALSE)
plot(temperature.efp, alpha = 0.05, alt.boundary = TRUE)

From above shown plot there are not any thresholds (red shown) crossings, hence no shift in level associated to our time series.

Further the Fstat() funcion allows to investigate any structural changes whereby the breakpoint is not known in advance.

mmt_data <- coredata(mmt)
temperature.efp.efp.Fstats <- Fstats(mmt_data ~ 1,  from=0.05, to=0.95)
sctest(temperature.efp.efp.Fstats, type="supF")
## 
##  supF test
## 
## data:  temperature.efp.efp.Fstats
## sup.F = 1.6824, p-value = 0.9702

Based on resulting p-value, the null hypothesis of no structural changes cannot be rejected.

Conclusions

There is a plenty of information that can be highlighted even in a simple dataset as the one I have herein shown. All that in terms of summarisations and plots.

In the following posts I will analyze the time series structure of the dataset herein introduced.

Note

Original post published on January 1st 2015 has been updated by substantial reworking and enhancing the analysis. Due to blog limits in post size, I had also to split it in two separate posts.

No comments:

Post a Comment

Note: Only a member of this blog may post a comment.