Friday, October 30, 2015

Principal Component Analysis and images

Image reconstruction based on PCA

In this post, I am going to show how to decompose an image in its three RGB layers. Further I will apply Principal Component Analysis (PCA) on all three layers in order to capture just the amount of variance required as input parameter. I then reconstruct back the original image by comprising only the PCA components necessary and sufficient to hit the variance target.

To load the original image in jpeg format, I will take advantage of the jpeg R package.

library(jpeg)
img <- readJPEG("mycat.jpg", FALSE)
dim(img)
## [1] 816 612   3
d <- dim(img); xmax <- 10; ymax <- round(xmax*d[1]/d[2],0); 
xlim <- c(1,xmax); ylim <- c(1,ymax); offset <- 5
color.set <- c(rgb(1,0,0,0.2), rgb(0,1,0,0.2), rgb(0,0,1,0.2))

As you can see above by the dim() statement, the image is loaded as a three dimensional array, where three image layers, each one associated to a specific RGB color, find place.

Let us plot the original image.

plot(0, 0, type='n', xlim=c(1,xmax), ylim=ylim)
rasterImage(img, 1, 1, xmax, ymax)

original image.

I split the original image into its three RGB layers.

make.imglayer<-function(img, color) {
  empty <- array(0, dim=dim(img))
  if (color =="red") { 
    c <- 1
  } else if (color=="green") {
    c <- 2
  } else if (color == "blue") {
    c <- 3
  }
  empty[,,c] <- img[,,c]
  col <- rgb(empty[,,1], empty[,,2], empty[,,3])
  dim(col) <- dim(img[,,c])
  col
}

img.red <- make.imglayer(img, "red")
img.green <- make.imglayer(img, "green")
img.blue <- make.imglayer(img, "blue")

I plot separately each image layer, each one associated to one color, red, green and blue color in the order.

plot(0, 0, type='n', xlim=c(1,3*xmax+2*offset), ylim=ylim)
rasterImage(img.red, 1, 1, xmax , ymax)
rasterImage(img.green, xmax+offset, 1, 2*xmax+offset , ymax)
rasterImage(img.blue, 2*(xmax+offset), 1, 3*xmax+2*offset , ymax)

original image rgb decomposition.

Then, I implement a routine which applies PCA on a given image layer and returns back a list comprising of:

  • the number of principal components to capture the specified variance

  • the variance captured by each pca component

  • the reconstructed image based on the number of principal components of the first bullet

In order to achieve such goal, two functions have been implemented. First pca.comps() whose input is an array of the variance associated to each PCA component. The values in such sdev array are in decreasing order. I compute the minimum number of PCA components necessary and sufficient to reach the specified variance target. Second is the main PCA decomposition routine, pca.image(), which processes a specific image layer and returns back a list including all abovementioned variables.

pca.comps <- function(pca, variance) {
  sdev.all <- sum(pca$sdev)
  sdev.cum <- cumsum(pca$sdev)/sdev.all
  num <- which(sdev.cum*100 > variance)[1]
  num
}

pca.image <- function(img.layer, variance) {
  res <- prcomp(img.layer, center = FALSE, scale = FALSE)
  npc <- pca.comps(res, variance)
  reconstruction <- res$x[,1:npc] %*% t(res$rotation[,1:npc])
  # ensuring values are within [0,1] interval
  reconstruction[reconstruction > 1] <- 1
  reconstruction[reconstruction < 0] <- 0
  list("npc" = npc, "sdev"=res$sdev, "reconstruction" = reconstruction)
}

Suppose we want to build an image based on the 60% of original image variance. We apply PCA to each layer, identifiy how many PCA components to include and then build back the image layer based on those. We apply that procedure to each image layer and then finally combine them together to view a reconstructed image providing with RGB properties.

variance <- 60
pca.img <- array(0, dim=d)
pca.res <- list()
for (i in 1:3) {
  res <- pca.image(img[,,i], variance)
  pca.res <- c(pca.res, list("result"= res))
  pca.img[,,i] <- res$reconstruction
}

To understand better how the variance is progressively captured by PCA components, I plot the cumulative percentage amount of variance percentage associated to each specific number of PCA components. For each color-associated image layer, the result is very similar.

plot.sdev <- function(pca.res, col) {
  sdev.all <- sum(pca.res$sdev)
  sdev.cum <- cumsum(pca.res$sdev)/sdev.all
  plot(sdev.cum, pch=19, col=color.set[col], xlab="number of PCA components", 
       ylab="variance percentage captured")
  abline(v=pca.res$npc)
}

invisible(lapply(pca.res, function(x) plot.sdev(x, parent.frame()$i)))

pca.img.red <- make.imglayer(pca.img, "red")
pca.img.green <- make.imglayer(pca.img, "green")
pca.img.blue <- make.imglayer(pca.img, "blue")

I plot each PCA-approximated image layer side by side.

plot(0, 0, type='n', xlim=c(1,3*xmax+2*offset), ylim=ylim, xlab="x", ylab="y",
     main="Giorgio, restore back all image layers immediately!")
rasterImage(pca.img.red, 1, 1, xmax , ymax)
rasterImage(pca.img.green, xmax+offset, 1, 2*xmax+offset , ymax)
rasterImage(pca.img.blue, 2*(xmax+offset), 1, 3*xmax+2*offset , ymax)

pca rgb image.

Ultimately, I restore back everything properly and compare the original image I started with (left) with its 60% variance PCA based approximation (right).

plot(0, 0, type='n', xlim=c(1,2*xmax+2*offset), ylim=ylim)
rasterImage(img, 1, 1, xmax , ymax)
rasterImage(pca.img, xmax+offset, 1, 2*xmax+offset , ymax)

pca image.

If you inspect both images with attention, you will be able to spot some differences, anyway even with only 60% of variance the reconstructed image appears of good quality.

You can try reconstructing the image capturing some other variance value and see how it appears at the end.