Thursday, October 4, 2018

Plant Leaf Classification - Part 3

Plant Leaf Classification with Keras

Abstract

The same UCI Machine Learning dataset I used in previous posts provides with the pool of plant leaf images whose features are reported within shape, texture and margin datasets. In this post I show a deep learning model fit using Keras and Tensorflow as back-end to classify plant leaf images.

Packages

suppressPackageStartupMessages(library(jpeg))
suppressPackageStartupMessages(library(EBImage))
suppressPackageStartupMessages(library(keras))
suppressPackageStartupMessages(library(grid))
suppressPackageStartupMessages(library(gridExtra))
suppressPackageStartupMessages(library(dplyr))

Analysis

Downloading and unzipping the UCI Machine Learning 100 Plant leaf dataset.

url <- "https://archive.ics.uci.edu/ml/machine-learning-databases/00241/100%20leaves%20plant%20species.zip"
temp_file <- tempfile()
download.file(url, temp_file)
files_to_unzip <- c(margin_file, shape_file, texture_file)
unzip(temp_file, files = files_to_unzip, exdir=".", overwrite = TRUE)

Listing all images set as available within the UCI Machine learning downloaded zip file.

working_dir <- "~/Documents/R/aroundrblog/datascienceplus/leaves/100 leaves plant species/data/"
setwd(working_dir)

lf <- list.files()
img_list_len <- length(lf)
lf
##   [1] "Acer_Campestre"               "Acer_Capillipes"             
##   [3] "Acer_Circinatum"              "Acer_Mono"                   
##   [5] "Acer_Opalus"                  "Acer_Palmatum"               
##   [7] "Acer_Pictum"                  "Acer_Platanoids"             
##   [9] "Acer_Rubrum"                  "Acer_Rufinerve"              
##  [11] "Acer_Saccharinum"             "Alnus_Cordata"               
##  [13] "Alnus_Maximowiczii"           "Alnus_Rubra"                 
##  [15] "Alnus_Sieboldiana"            "Alnus_Viridis"               
##  [17] "Arundinaria_Simonii"          "Betula_Austrosinensis"       
##  [19] "Betula_Pendula"               "Callicarpa_Bodinieri"        
##  [21] "Castanea_Sativa"              "Celtis_Koraiensis"           
##  [23] "Cercis_Siliquastrum"          "Cornus_Chinensis"            
##  [25] "Cornus_Controversa"           "Cornus_Macrophylla"          
##  [27] "Cotinus_Coggygria"            "Crataegus_Monogyna"          
##  [29] "Cytisus_Battandieri"          "Eucalyptus_Glaucescens"      
##  [31] "Eucalyptus_Neglecta"          "Eucalyptus_Urnigera"         
##  [33] "Fagus_Sylvatica"              "Ginkgo_Biloba"               
##  [35] "Ilex_Aquifolium"              "Ilex_Cornuta"                
##  [37] "Liquidambar_Styraciflua"      "Liriodendron_Tulipifera"     
##  [39] "Lithocarpus_Cleistocarpus"    "Lithocarpus_Edulis"          
##  [41] "Magnolia_Heptapeta"           "Magnolia_Salicifolia"        
##  [43] "Morus_Nigra"                  "Olea_Europaea"               
##  [45] "Phildelphus"                  "Populus_Adenopoda"           
##  [47] "Populus_Grandidentata"        "Populus_Nigra"               
##  [49] "Prunus_Avium"                 "Prunus_X_Shmittii"           
##  [51] "Pterocarya_Stenoptera"        "Quercus_Afares"              
##  [53] "Quercus_Agrifolia"            "Quercus_Alnifolia"           
##  [55] "Quercus_Brantii"              "Quercus_Canariensis"         
##  [57] "Quercus_Castaneifolia"        "Quercus_Cerris"              
##  [59] "Quercus_Chrysolepis"          "Quercus_Coccifera"           
##  [61] "Quercus_Coccinea"             "Quercus_Crassifolia"         
##  [63] "Quercus_Crassipes"            "Quercus_Dolicholepis"        
##  [65] "Quercus_Ellipsoidalis"        "Quercus_Greggii"             
##  [67] "Quercus_Hartwissiana"         "Quercus_Ilex"                
##  [69] "Quercus_Imbricaria"           "Quercus_Infectoria_sub"      
##  [71] "Quercus_Kewensis"             "Quercus_Nigra"               
##  [73] "Quercus_Palustris"            "Quercus_Phellos"             
##  [75] "Quercus_Phillyraeoides"       "Quercus_Pontica"             
##  [77] "Quercus_Pubescens"            "Quercus_Pyrenaica"           
##  [79] "Quercus_Rhysophylla"          "Quercus_Rubra"               
##  [81] "Quercus_Semecarpifolia"       "Quercus_Shumardii"           
##  [83] "Quercus_Suber"                "Quercus_Texana"              
##  [85] "Quercus_Trojana"              "Quercus_Variabilis"          
##  [87] "Quercus_Vulcanica"            "Quercus_x_Hispanica"         
##  [89] "Quercus_x_Turneri"            "Rhododendron_x_Russellianum" 
##  [91] "Salix_Fragilis"               "Salix_Intergra"              
##  [93] "Sorbus_Aria"                  "Tilia_Oliveri"               
##  [95] "Tilia_Platyphyllos"           "Tilia_Tomentosa"             
##  [97] "Ulmus_Bergmanniana"           "Viburnum_Tinus"              
##  [99] "Viburnum_x_Rhytidophylloides" "Zelkova_Serrata"

There is one subdirectory for each plant species.

data_dir <- paste(working_dir, lf, sep = "")
data_dir[1:5]
## [1] "~/Documents/R/aroundrblog/datascienceplus/leaves/100 leaves plant species/data/Acer_Campestre" 
## [2] "~/Documents/R/aroundrblog/datascienceplus/leaves/100 leaves plant species/data/Acer_Capillipes"
## [3] "~/Documents/R/aroundrblog/datascienceplus/leaves/100 leaves plant species/data/Acer_Circinatum"
## [4] "~/Documents/R/aroundrblog/datascienceplus/leaves/100 leaves plant species/data/Acer_Mono"      
## [5] "~/Documents/R/aroundrblog/datascienceplus/leaves/100 leaves plant species/data/Acer_Opalus"

Loading the image files as 100 pixels JPEG. I split the overall images set in the training and test pools.

files_in_dir <- sapply(data_dir, function(x) {length(list.files(x))})
files_in_dir
##               ~/Documents/R/aroundrblog/datascienceplus/leaves/100 leaves plant species/data/Acer_Campestre 
##                                                                                                          16 
##              ~/Documents/R/aroundrblog/datascienceplus/leaves/100 leaves plant species/data/Acer_Capillipes 
##                                                                                                          16 
##              ~/Documents/R/aroundrblog/datascienceplus/leaves/100 leaves plant species/data/Acer_Circinatum 
##                                                                                                          16 
##                    ~/Documents/R/aroundrblog/datascienceplus/leaves/100 leaves plant species/data/Acer_Mono 
##                                                                                                          16 
##                  ~/Documents/R/aroundrblog/datascienceplus/leaves/100 leaves plant species/data/Acer_Opalus 
##                                                                                                          16 
##                ~/Documents/R/aroundrblog/datascienceplus/leaves/100 leaves plant species/data/Acer_Palmatum 
##                                                                                                          16 
##                  ~/Documents/R/aroundrblog/datascienceplus/leaves/100 leaves plant species/data/Acer_Pictum 
##                                                                                                          16 
##              ~/Documents/R/aroundrblog/datascienceplus/leaves/100 leaves plant species/data/Acer_Platanoids 
##                                                                                                          16 
##                  ~/Documents/R/aroundrblog/datascienceplus/leaves/100 leaves plant species/data/Acer_Rubrum 
##                                                                                                          16 
##               ~/Documents/R/aroundrblog/datascienceplus/leaves/100 leaves plant species/data/Acer_Rufinerve 
##                                                                                                          16 
##             ~/Documents/R/aroundrblog/datascienceplus/leaves/100 leaves plant species/data/Acer_Saccharinum 
##                                                                                                          16 
##                ~/Documents/R/aroundrblog/datascienceplus/leaves/100 leaves plant species/data/Alnus_Cordata 
##                                                                                                          16 
##           ~/Documents/R/aroundrblog/datascienceplus/leaves/100 leaves plant species/data/Alnus_Maximowiczii 
##                                                                                                          16 
##                  ~/Documents/R/aroundrblog/datascienceplus/leaves/100 leaves plant species/data/Alnus_Rubra 
##                                                                                                          16 
##            ~/Documents/R/aroundrblog/datascienceplus/leaves/100 leaves plant species/data/Alnus_Sieboldiana 
##                                                                                                          16 
##                ~/Documents/R/aroundrblog/datascienceplus/leaves/100 leaves plant species/data/Alnus_Viridis 
##                                                                                                          16 
##          ~/Documents/R/aroundrblog/datascienceplus/leaves/100 leaves plant species/data/Arundinaria_Simonii 
##                                                                                                          16 
##        ~/Documents/R/aroundrblog/datascienceplus/leaves/100 leaves plant species/data/Betula_Austrosinensis 
##                                                                                                          16 
##               ~/Documents/R/aroundrblog/datascienceplus/leaves/100 leaves plant species/data/Betula_Pendula 
##                                                                                                          16 
##         ~/Documents/R/aroundrblog/datascienceplus/leaves/100 leaves plant species/data/Callicarpa_Bodinieri 
##                                                                                                          16 
##              ~/Documents/R/aroundrblog/datascienceplus/leaves/100 leaves plant species/data/Castanea_Sativa 
##                                                                                                          16 
##            ~/Documents/R/aroundrblog/datascienceplus/leaves/100 leaves plant species/data/Celtis_Koraiensis 
##                                                                                                          16 
##          ~/Documents/R/aroundrblog/datascienceplus/leaves/100 leaves plant species/data/Cercis_Siliquastrum 
##                                                                                                          16 
##             ~/Documents/R/aroundrblog/datascienceplus/leaves/100 leaves plant species/data/Cornus_Chinensis 
##                                                                                                          16 
##           ~/Documents/R/aroundrblog/datascienceplus/leaves/100 leaves plant species/data/Cornus_Controversa 
##                                                                                                          16 
##           ~/Documents/R/aroundrblog/datascienceplus/leaves/100 leaves plant species/data/Cornus_Macrophylla 
##                                                                                                          16 
##            ~/Documents/R/aroundrblog/datascienceplus/leaves/100 leaves plant species/data/Cotinus_Coggygria 
##                                                                                                          16 
##           ~/Documents/R/aroundrblog/datascienceplus/leaves/100 leaves plant species/data/Crataegus_Monogyna 
##                                                                                                          16 
##          ~/Documents/R/aroundrblog/datascienceplus/leaves/100 leaves plant species/data/Cytisus_Battandieri 
##                                                                                                          16 
##       ~/Documents/R/aroundrblog/datascienceplus/leaves/100 leaves plant species/data/Eucalyptus_Glaucescens 
##                                                                                                          16 
##          ~/Documents/R/aroundrblog/datascienceplus/leaves/100 leaves plant species/data/Eucalyptus_Neglecta 
##                                                                                                          16 
##          ~/Documents/R/aroundrblog/datascienceplus/leaves/100 leaves plant species/data/Eucalyptus_Urnigera 
##                                                                                                          16 
##              ~/Documents/R/aroundrblog/datascienceplus/leaves/100 leaves plant species/data/Fagus_Sylvatica 
##                                                                                                          16 
##                ~/Documents/R/aroundrblog/datascienceplus/leaves/100 leaves plant species/data/Ginkgo_Biloba 
##                                                                                                          16 
##              ~/Documents/R/aroundrblog/datascienceplus/leaves/100 leaves plant species/data/Ilex_Aquifolium 
##                                                                                                          16 
##                 ~/Documents/R/aroundrblog/datascienceplus/leaves/100 leaves plant species/data/Ilex_Cornuta 
##                                                                                                          16 
##      ~/Documents/R/aroundrblog/datascienceplus/leaves/100 leaves plant species/data/Liquidambar_Styraciflua 
##                                                                                                          16 
##      ~/Documents/R/aroundrblog/datascienceplus/leaves/100 leaves plant species/data/Liriodendron_Tulipifera 
##                                                                                                          16 
##    ~/Documents/R/aroundrblog/datascienceplus/leaves/100 leaves plant species/data/Lithocarpus_Cleistocarpus 
##                                                                                                          16 
##           ~/Documents/R/aroundrblog/datascienceplus/leaves/100 leaves plant species/data/Lithocarpus_Edulis 
##                                                                                                          16 
##           ~/Documents/R/aroundrblog/datascienceplus/leaves/100 leaves plant species/data/Magnolia_Heptapeta 
##                                                                                                          16 
##         ~/Documents/R/aroundrblog/datascienceplus/leaves/100 leaves plant species/data/Magnolia_Salicifolia 
##                                                                                                          16 
##                  ~/Documents/R/aroundrblog/datascienceplus/leaves/100 leaves plant species/data/Morus_Nigra 
##                                                                                                          16 
##                ~/Documents/R/aroundrblog/datascienceplus/leaves/100 leaves plant species/data/Olea_Europaea 
##                                                                                                          16 
##                  ~/Documents/R/aroundrblog/datascienceplus/leaves/100 leaves plant species/data/Phildelphus 
##                                                                                                          16 
##            ~/Documents/R/aroundrblog/datascienceplus/leaves/100 leaves plant species/data/Populus_Adenopoda 
##                                                                                                          16 
##        ~/Documents/R/aroundrblog/datascienceplus/leaves/100 leaves plant species/data/Populus_Grandidentata 
##                                                                                                          16 
##                ~/Documents/R/aroundrblog/datascienceplus/leaves/100 leaves plant species/data/Populus_Nigra 
##                                                                                                          16 
##                 ~/Documents/R/aroundrblog/datascienceplus/leaves/100 leaves plant species/data/Prunus_Avium 
##                                                                                                          16 
##            ~/Documents/R/aroundrblog/datascienceplus/leaves/100 leaves plant species/data/Prunus_X_Shmittii 
##                                                                                                          16 
##        ~/Documents/R/aroundrblog/datascienceplus/leaves/100 leaves plant species/data/Pterocarya_Stenoptera 
##                                                                                                          16 
##               ~/Documents/R/aroundrblog/datascienceplus/leaves/100 leaves plant species/data/Quercus_Afares 
##                                                                                                          16 
##            ~/Documents/R/aroundrblog/datascienceplus/leaves/100 leaves plant species/data/Quercus_Agrifolia 
##                                                                                                          16 
##            ~/Documents/R/aroundrblog/datascienceplus/leaves/100 leaves plant species/data/Quercus_Alnifolia 
##                                                                                                          16 
##              ~/Documents/R/aroundrblog/datascienceplus/leaves/100 leaves plant species/data/Quercus_Brantii 
##                                                                                                          16 
##          ~/Documents/R/aroundrblog/datascienceplus/leaves/100 leaves plant species/data/Quercus_Canariensis 
##                                                                                                          16 
##        ~/Documents/R/aroundrblog/datascienceplus/leaves/100 leaves plant species/data/Quercus_Castaneifolia 
##                                                                                                          16 
##               ~/Documents/R/aroundrblog/datascienceplus/leaves/100 leaves plant species/data/Quercus_Cerris 
##                                                                                                          16 
##          ~/Documents/R/aroundrblog/datascienceplus/leaves/100 leaves plant species/data/Quercus_Chrysolepis 
##                                                                                                          16 
##            ~/Documents/R/aroundrblog/datascienceplus/leaves/100 leaves plant species/data/Quercus_Coccifera 
##                                                                                                          16 
##             ~/Documents/R/aroundrblog/datascienceplus/leaves/100 leaves plant species/data/Quercus_Coccinea 
##                                                                                                          16 
##          ~/Documents/R/aroundrblog/datascienceplus/leaves/100 leaves plant species/data/Quercus_Crassifolia 
##                                                                                                          16 
##            ~/Documents/R/aroundrblog/datascienceplus/leaves/100 leaves plant species/data/Quercus_Crassipes 
##                                                                                                          16 
##         ~/Documents/R/aroundrblog/datascienceplus/leaves/100 leaves plant species/data/Quercus_Dolicholepis 
##                                                                                                          16 
##        ~/Documents/R/aroundrblog/datascienceplus/leaves/100 leaves plant species/data/Quercus_Ellipsoidalis 
##                                                                                                          16 
##              ~/Documents/R/aroundrblog/datascienceplus/leaves/100 leaves plant species/data/Quercus_Greggii 
##                                                                                                          16 
##         ~/Documents/R/aroundrblog/datascienceplus/leaves/100 leaves plant species/data/Quercus_Hartwissiana 
##                                                                                                          16 
##                 ~/Documents/R/aroundrblog/datascienceplus/leaves/100 leaves plant species/data/Quercus_Ilex 
##                                                                                                          16 
##           ~/Documents/R/aroundrblog/datascienceplus/leaves/100 leaves plant species/data/Quercus_Imbricaria 
##                                                                                                          16 
##       ~/Documents/R/aroundrblog/datascienceplus/leaves/100 leaves plant species/data/Quercus_Infectoria_sub 
##                                                                                                          16 
##             ~/Documents/R/aroundrblog/datascienceplus/leaves/100 leaves plant species/data/Quercus_Kewensis 
##                                                                                                          16 
##                ~/Documents/R/aroundrblog/datascienceplus/leaves/100 leaves plant species/data/Quercus_Nigra 
##                                                                                                          16 
##            ~/Documents/R/aroundrblog/datascienceplus/leaves/100 leaves plant species/data/Quercus_Palustris 
##                                                                                                          16 
##              ~/Documents/R/aroundrblog/datascienceplus/leaves/100 leaves plant species/data/Quercus_Phellos 
##                                                                                                          16 
##       ~/Documents/R/aroundrblog/datascienceplus/leaves/100 leaves plant species/data/Quercus_Phillyraeoides 
##                                                                                                          16 
##              ~/Documents/R/aroundrblog/datascienceplus/leaves/100 leaves plant species/data/Quercus_Pontica 
##                                                                                                          16 
##            ~/Documents/R/aroundrblog/datascienceplus/leaves/100 leaves plant species/data/Quercus_Pubescens 
##                                                                                                          16 
##            ~/Documents/R/aroundrblog/datascienceplus/leaves/100 leaves plant species/data/Quercus_Pyrenaica 
##                                                                                                          16 
##          ~/Documents/R/aroundrblog/datascienceplus/leaves/100 leaves plant species/data/Quercus_Rhysophylla 
##                                                                                                          16 
##                ~/Documents/R/aroundrblog/datascienceplus/leaves/100 leaves plant species/data/Quercus_Rubra 
##                                                                                                          16 
##       ~/Documents/R/aroundrblog/datascienceplus/leaves/100 leaves plant species/data/Quercus_Semecarpifolia 
##                                                                                                          16 
##            ~/Documents/R/aroundrblog/datascienceplus/leaves/100 leaves plant species/data/Quercus_Shumardii 
##                                                                                                          16 
##                ~/Documents/R/aroundrblog/datascienceplus/leaves/100 leaves plant species/data/Quercus_Suber 
##                                                                                                          16 
##               ~/Documents/R/aroundrblog/datascienceplus/leaves/100 leaves plant species/data/Quercus_Texana 
##                                                                                                          16 
##              ~/Documents/R/aroundrblog/datascienceplus/leaves/100 leaves plant species/data/Quercus_Trojana 
##                                                                                                          16 
##           ~/Documents/R/aroundrblog/datascienceplus/leaves/100 leaves plant species/data/Quercus_Variabilis 
##                                                                                                          16 
##            ~/Documents/R/aroundrblog/datascienceplus/leaves/100 leaves plant species/data/Quercus_Vulcanica 
##                                                                                                          16 
##          ~/Documents/R/aroundrblog/datascienceplus/leaves/100 leaves plant species/data/Quercus_x_Hispanica 
##                                                                                                          16 
##            ~/Documents/R/aroundrblog/datascienceplus/leaves/100 leaves plant species/data/Quercus_x_Turneri 
##                                                                                                          16 
##  ~/Documents/R/aroundrblog/datascienceplus/leaves/100 leaves plant species/data/Rhododendron_x_Russellianum 
##                                                                                                          16 
##               ~/Documents/R/aroundrblog/datascienceplus/leaves/100 leaves plant species/data/Salix_Fragilis 
##                                                                                                          16 
##               ~/Documents/R/aroundrblog/datascienceplus/leaves/100 leaves plant species/data/Salix_Intergra 
##                                                                                                          16 
##                  ~/Documents/R/aroundrblog/datascienceplus/leaves/100 leaves plant species/data/Sorbus_Aria 
##                                                                                                          16 
##                ~/Documents/R/aroundrblog/datascienceplus/leaves/100 leaves plant species/data/Tilia_Oliveri 
##                                                                                                          16 
##           ~/Documents/R/aroundrblog/datascienceplus/leaves/100 leaves plant species/data/Tilia_Platyphyllos 
##                                                                                                          16 
##              ~/Documents/R/aroundrblog/datascienceplus/leaves/100 leaves plant species/data/Tilia_Tomentosa 
##                                                                                                          16 
##           ~/Documents/R/aroundrblog/datascienceplus/leaves/100 leaves plant species/data/Ulmus_Bergmanniana 
##                                                                                                          16 
##               ~/Documents/R/aroundrblog/datascienceplus/leaves/100 leaves plant species/data/Viburnum_Tinus 
##                                                                                                          16 
## ~/Documents/R/aroundrblog/datascienceplus/leaves/100 leaves plant species/data/Viburnum_x_Rhytidophylloides 
##                                                                                                          16 
##              ~/Documents/R/aroundrblog/datascienceplus/leaves/100 leaves plant species/data/Zelkova_Serrata 
##                                                                                                          16
length(files_in_dir)
## [1] 100

Dumps above tells us there are 16 images for each plant species and there are available 100 plant species folders where images are stored. So, we have available a few images for each plant species and about that I will talk in the Conclusions paragraph.

set.seed(1023)

img_dim <- 100
train_size <- 0.8

x_train <- list()
y_train <- list()
x_test <- list()
y_test <- list()
imgname_test <- list()

# considering all directories where images are located
for (i in 1:length(data_dir)) {
  # changing directory
  setwd(data_dir[[i]])

  # list the files of a subfolder where one species is located
  fl <- list.files()
  l <- length(fl)
  # sammpling to determine the training set
  s <- sample(1:l, train_size*l)

  # training and test names list
  train_x_name <- fl[s]
  test_x_name <- fl[setdiff(1:l, s)]
  
  # loading training images set with names associated to
  img_x_l <- lapply(train_x_name, readJPEG)
  img_x_l <- lapply(img_x_l, function(x) {resize(x, w=img_dim, h=img_dim)})

  # updating the training x data
  x_train <- c(x_train, img_x_l)
  # the y train data represent an identifier associated to the
  # subfolder in processing, hence associated to one species
  y_train <- c(y_train, rep(i-1, length(s))) # image folders indexes need to be zero based
  
  # loading the test images set with names associated to
  img_x_l <- lapply(test_x_name, readJPEG)
  img_x_l <- lapply(img_x_l, function(x) {resize(x, w=img_dim, h=img_dim)})

  # updating the test x data and y data in a similar fashion as the training data
  x_test <- c(x_test, img_x_l)
  y_test <- c(y_test, rep(i-1, l-length(s))) # image folders indexes need to be zero based
}

Let us have a look at how the plant lead images look like.

image(x_train[[1]])

image(x_train[[20]])

image(x_train[[50]])

The train, validation and test datasets are then determined.

leaves_img <- list()
leaves_img$train <- list()
leaves_img$test <- list()

leaves_img$train$x <- array(0, c(length(x_train), img_dim, img_dim))

for (k in 1:length(x_train)) {
  leaves_img$train$x[k,,] <- x_train[[k]]
}

leaves_img$train$y <- unlist(y_train)

leaves_img$test$x <- array(0, c(length(x_test), img_dim, img_dim))
dim(leaves_img$test$x)
## [1] 400 100 100
for (k in 1:length(x_test)) {
  leaves_img$test$x[k,,] <- x_test[[k]]
}

leaves_img$test$y <- unlist(y_test)

x_train <- leaves_img$train$x
y_train <- leaves_img$train$y
x_test <- leaves_img$test$x
y_test <- leaves_img$test$y

x_train <- array_reshape(x_train, c(nrow(x_train), c(img_dim,img_dim,1)))
y_train <- to_categorical(y_train, img_list_len)

x_test <- array_reshape(x_test, c(nrow(x_test), c(img_dim,img_dim,1)))
y_test <- to_categorical(y_test, img_list_len)

val_len <- nrow(y_test)/2
val_seq <- seq(from=1, to=nrow(y_test), by=2)

x_val <- x_test[val_seq,,,,drop=FALSE]
y_val <- y_test[val_seq,,drop=FALSE]

x_test <- x_test[-c(val_seq),,,,drop=FALSE]
y_test <- y_test[-c(val_seq),,drop=FALSE]

At the end, we have split our 1600 images set in training, validation and test datasets having the following number of records.

nrow(y_train)
## [1] 1200
nrow(y_val)
## [1] 200
nrow(y_test)
## [1] 200

A Keras model based is the defined by taking advantage of a first sequential model followed by a pool of convolutional layer intermixed with pooling and dropout layers.

model <- keras_model_sequential() %>% 
  layer_conv_2d(filters = 32, kernel_size = c(3,3), activation = 'relu', 
                input_shape = c(img_dim, img_dim, 1)) %>%
  layer_max_pooling_2d(pool_size = c(2,2)) %>%
  layer_dropout(rate = 0.5) %>%
  layer_conv_2d(filters = 64, kernel_size = c(3,3), activation = 'relu') %>%
  layer_max_pooling_2d(pool_size = c(2,2)) %>%
  layer_dropout(rate = 0.5) %>%
  layer_conv_2d(filters = 128, kernel_size = c(3,3), activation = 'relu')  %>%
  layer_max_pooling_2d(pool_size = c(2,2)) %>%
  layer_dropout(rate = 0.5) %>%
  layer_conv_2d(filters = 128, kernel_size = c(3,3), activation = 'relu')  %>%
  layer_max_pooling_2d(pool_size = c(2,2)) %>%
  layer_dropout(rate = 0.5)

model <- model %>% layer_flatten() %>%
  layer_dense(units = 256, activation = 'relu') %>% 
  layer_dropout(rate = 0.5) %>%
  layer_dense(units = img_list_len, activation = 'softmax')

summary(model)
## ___________________________________________________________________________
## Layer (type)                     Output Shape                  Param #     
## ===========================================================================
## conv2d_1 (Conv2D)                (None, 98, 98, 32)            320         
## ___________________________________________________________________________
## max_pooling2d_1 (MaxPooling2D)   (None, 49, 49, 32)            0           
## ___________________________________________________________________________
## dropout_1 (Dropout)              (None, 49, 49, 32)            0           
## ___________________________________________________________________________
## conv2d_2 (Conv2D)                (None, 47, 47, 64)            18496       
## ___________________________________________________________________________
## max_pooling2d_2 (MaxPooling2D)   (None, 23, 23, 64)            0           
## ___________________________________________________________________________
## dropout_2 (Dropout)              (None, 23, 23, 64)            0           
## ___________________________________________________________________________
## conv2d_3 (Conv2D)                (None, 21, 21, 128)           73856       
## ___________________________________________________________________________
## max_pooling2d_3 (MaxPooling2D)   (None, 10, 10, 128)           0           
## ___________________________________________________________________________
## dropout_3 (Dropout)              (None, 10, 10, 128)           0           
## ___________________________________________________________________________
## conv2d_4 (Conv2D)                (None, 8, 8, 128)             147584      
## ___________________________________________________________________________
## max_pooling2d_4 (MaxPooling2D)   (None, 4, 4, 128)             0           
## ___________________________________________________________________________
## dropout_4 (Dropout)              (None, 4, 4, 128)             0           
## ___________________________________________________________________________
## flatten_1 (Flatten)              (None, 2048)                  0           
## ___________________________________________________________________________
## dense_1 (Dense)                  (None, 256)                   524544      
## ___________________________________________________________________________
## dropout_5 (Dropout)              (None, 256)                   0           
## ___________________________________________________________________________
## dense_2 (Dense)                  (None, 100)                   25700       
## ===========================================================================
## Total params: 790,500
## Trainable params: 790,500
## Non-trainable params: 0
## ___________________________________________________________________________
model %>% compile(
  loss = 'categorical_crossentropy',
  optimizer = optimizer_adam(lr=1e-3),
  metrics = c('accuracy')
)

Finally, we fit the deep learning model.

# https://github.com/keras-team/keras/issues/4298
#
# validation_split does now allow for val_acc different from zero

history <- model %>% fit(
  x_train, 
  y_train, 
  epochs = 40,
  batch_size = 10,
  validation_data = list(x_val, y_val),
  shuffle = TRUE
)

plot(history)

We evaluate our model against the training dataset.

model %>% evaluate(x_train, y_train)
## $loss
## [1] 0.5067297
## 
## $acc
## [1] 0.9183333

We then evaluate our model accuracy against the test dataset, which is the figure of interest.

model %>% evaluate(x_test, y_test)
## $loss
## [1] 0.9157
## 
## $acc
## [1] 0.77

It is further interesting to have a look at the prediction and compare them with the test dataset values.

pred_res <- model %>% predict_classes(x_test)
head(pred_res) # zero based species identifiers
## [1] 0 8 1 1 0 2
test_image_ids <- leaves_img$test$y[-c(val_seq)]
comparison <- data.frame(pred_res, test_image_ids)
colnames(comparison) <- c("prediction", "actual")
head(comparison,30) # zero based species identifiers
##    prediction actual
## 1           0      0
## 2           8      0
## 3           1      1
## 4           1      1
## 5           0      2
## 6           2      2
## 7           3      3
## 8           3      3
## 9           4      4
## 10          4      4
## 11          5      5
## 12          5      5
## 13          6      6
## 14          6      6
## 15          7      7
## 16          7      7
## 17          8      8
## 18          8      8
## 19          9      9
## 20          9      9
## 21         10     10
## 22         10     10
## 23         11     11
## 24         11     11
## 25         12     12
## 26         12     12
## 27         13     13
## 28         13     13
## 29         14     14
## 30         14     14

Let us compute accuracy by species.

accuracy_by_species <- comparison %>% group_by(actual) %>% summarise(round(100*sum(actual == prediction)/n(), 2))
accuracy_by_species_df <- as.data.frame(accuracy_by_species)
colnames(accuracy_by_species_df) <- c("species_id", "accuracy")
species_names <- lf[accuracy_by_species_df$species_id+1]
test_result <- data.frame(species=species_names, accuracy=accuracy_by_species_df$accuracy)
test_result
##                          species accuracy
## 1                 Acer_Campestre       50
## 2                Acer_Capillipes      100
## 3                Acer_Circinatum       50
## 4                      Acer_Mono      100
## 5                    Acer_Opalus      100
## 6                  Acer_Palmatum      100
## 7                    Acer_Pictum      100
## 8                Acer_Platanoids      100
## 9                    Acer_Rubrum      100
## 10                Acer_Rufinerve      100
## 11              Acer_Saccharinum      100
## 12                 Alnus_Cordata      100
## 13            Alnus_Maximowiczii      100
## 14                   Alnus_Rubra      100
## 15             Alnus_Sieboldiana      100
## 16                 Alnus_Viridis       50
## 17           Arundinaria_Simonii      100
## 18         Betula_Austrosinensis       50
## 19                Betula_Pendula      100
## 20          Callicarpa_Bodinieri      100
## 21               Castanea_Sativa      100
## 22             Celtis_Koraiensis        0
## 23           Cercis_Siliquastrum      100
## 24              Cornus_Chinensis       50
## 25            Cornus_Controversa        0
## 26            Cornus_Macrophylla        0
## 27             Cotinus_Coggygria      100
## 28            Crataegus_Monogyna      100
## 29           Cytisus_Battandieri      100
## 30        Eucalyptus_Glaucescens      100
## 31           Eucalyptus_Neglecta        0
## 32           Eucalyptus_Urnigera      100
## 33               Fagus_Sylvatica       50
## 34                 Ginkgo_Biloba      100
## 35               Ilex_Aquifolium       50
## 36                  Ilex_Cornuta      100
## 37       Liquidambar_Styraciflua      100
## 38       Liriodendron_Tulipifera      100
## 39     Lithocarpus_Cleistocarpus       50
## 40            Lithocarpus_Edulis      100
## 41            Magnolia_Heptapeta      100
## 42          Magnolia_Salicifolia       50
## 43                   Morus_Nigra      100
## 44                 Olea_Europaea      100
## 45                   Phildelphus       50
## 46             Populus_Adenopoda       50
## 47         Populus_Grandidentata      100
## 48                 Populus_Nigra      100
## 49                  Prunus_Avium      100
## 50             Prunus_X_Shmittii      100
## 51         Pterocarya_Stenoptera       50
## 52                Quercus_Afares       50
## 53             Quercus_Agrifolia      100
## 54             Quercus_Alnifolia       50
## 55               Quercus_Brantii      100
## 56           Quercus_Canariensis      100
## 57         Quercus_Castaneifolia      100
## 58                Quercus_Cerris       50
## 59           Quercus_Chrysolepis        0
## 60             Quercus_Coccifera      100
## 61              Quercus_Coccinea      100
## 62           Quercus_Crassifolia       50
## 63             Quercus_Crassipes      100
## 64          Quercus_Dolicholepis        0
## 65         Quercus_Ellipsoidalis       50
## 66               Quercus_Greggii      100
## 67          Quercus_Hartwissiana       50
## 68                  Quercus_Ilex      100
## 69            Quercus_Imbricaria       50
## 70        Quercus_Infectoria_sub      100
## 71              Quercus_Kewensis       50
## 72                 Quercus_Nigra      100
## 73             Quercus_Palustris      100
## 74               Quercus_Phellos      100
## 75        Quercus_Phillyraeoides       50
## 76               Quercus_Pontica      100
## 77             Quercus_Pubescens      100
## 78             Quercus_Pyrenaica      100
## 79           Quercus_Rhysophylla        0
## 80                 Quercus_Rubra      100
## 81        Quercus_Semecarpifolia      100
## 82             Quercus_Shumardii      100
## 83                 Quercus_Suber       50
## 84                Quercus_Texana       50
## 85               Quercus_Trojana      100
## 86            Quercus_Variabilis      100
## 87             Quercus_Vulcanica       50
## 88           Quercus_x_Hispanica       50
## 89             Quercus_x_Turneri       50
## 90   Rhododendron_x_Russellianum      100
## 91                Salix_Fragilis       50
## 92                Salix_Intergra       50
## 93                   Sorbus_Aria      100
## 94                 Tilia_Oliveri      100
## 95            Tilia_Platyphyllos      100
## 96               Tilia_Tomentosa       50
## 97            Ulmus_Bergmanniana      100
## 98                Viburnum_Tinus       50
## 99  Viburnum_x_Rhytidophylloides        0
## 100              Zelkova_Serrata      100
table(test_result$accuracy)
## 
##   0  50 100 
##   8  30  62

Let us have a look at wrong predictions.

not_matching <- comparison[(comparison$prediction != comparison$actual),]
nrow(not_matching)
## [1] 46
nrow(not_matching)/nrow(comparison)
## [1] 0.23
head(not_matching, 10) # zero based species identifiers
##    prediction actual
## 2           8      0
## 5           0      2
## 32         11     15
## 36         24     17
## 43         26     21
## 44         26     21
## 47         41     23
## 49         38     24
## 50         26     24
## 51         39     25

Let us show how some of wrong predictions missed the actual leaf plant species. We compare images side-by-side, actual to the left and wrong prediction to the right.

for (k in 1:10) {

  i <- not_matching[k, 1] + 1 # image folders indexes were zero based
  j <- not_matching[k, 2] + 1

  predicted_image <- x_test[i,,,]
  predicted_image_name <- lf[i]
  actual_image <- x_test[j,,,]
  actual_image_name <- lf[j]

  top_label <- paste(actual_image_name, predicted_image_name, sep= "           ")
  img1 <-  rasterGrob(as.raster(actual_image), interpolate = FALSE)
  img2 <-  rasterGrob(as.raster(predicted_image), interpolate = FALSE)
  
  grid.arrange(img1, img2, ncol = 2, top = top_label)
}

Conclusions

We show a deep learning model able to classify the images as provided within the 100 plant leaf datasets zip file. Achieved accuracy is good, however it may be improved by the data augmentation technique capable to increase the number of available images for each plant species. That may help in improving the training phase, and, as a result, achieving higher accuracy.

References

[1] Charles Mallah, James Cope and James Orwell, "Plant leaf classification using probabilistic integration of shape, texture and margin features" [https://www.researchgate.net/publication/266632357_Plant_Leaf_Classification_using_Probabilistic_Integration_of_Shape_Texture_and_Margin_Features]

[2] 100 Plant Leaf Dataset [https://archive.ics.uci.edu/ml/datasets/One-hundred+plant+species+leaves+data+set]

No comments:

Post a Comment

Note: Only a member of this blog may post a comment.