Plant Leaf Classification with Keras
Abstract
The same UCI Machine Learning dataset I used in previous posts provides with the pool of plant leaf images whose features are reported within shape, texture and margin datasets. In this post I show a deep learning model fit using Keras and Tensorflow as back-end to classify plant leaf images.
Packages
suppressPackageStartupMessages(library(jpeg))
suppressPackageStartupMessages(library(EBImage))
suppressPackageStartupMessages(library(keras))
suppressPackageStartupMessages(library(grid))
suppressPackageStartupMessages(library(gridExtra))
suppressPackageStartupMessages(library(dplyr))
Analysis
Downloading and unzipping the UCI Machine Learning 100 Plant leaf dataset.
url <- "https://archive.ics.uci.edu/ml/machine-learning-databases/00241/100%20leaves%20plant%20species.zip"
temp_file <- tempfile()
download.file(url, temp_file)
files_to_unzip <- c(margin_file, shape_file, texture_file)
unzip(temp_file, files = files_to_unzip, exdir=".", overwrite = TRUE)
Listing all images set as available within the UCI Machine learning downloaded zip file.
working_dir <- "~/Documents/R/aroundrblog/datascienceplus/leaves/100 leaves plant species/data/"
setwd(working_dir)
lf <- list.files()
img_list_len <- length(lf)
lf
## [1] "Acer_Campestre" "Acer_Capillipes"
## [3] "Acer_Circinatum" "Acer_Mono"
## [5] "Acer_Opalus" "Acer_Palmatum"
## [7] "Acer_Pictum" "Acer_Platanoids"
## [9] "Acer_Rubrum" "Acer_Rufinerve"
## [11] "Acer_Saccharinum" "Alnus_Cordata"
## [13] "Alnus_Maximowiczii" "Alnus_Rubra"
## [15] "Alnus_Sieboldiana" "Alnus_Viridis"
## [17] "Arundinaria_Simonii" "Betula_Austrosinensis"
## [19] "Betula_Pendula" "Callicarpa_Bodinieri"
## [21] "Castanea_Sativa" "Celtis_Koraiensis"
## [23] "Cercis_Siliquastrum" "Cornus_Chinensis"
## [25] "Cornus_Controversa" "Cornus_Macrophylla"
## [27] "Cotinus_Coggygria" "Crataegus_Monogyna"
## [29] "Cytisus_Battandieri" "Eucalyptus_Glaucescens"
## [31] "Eucalyptus_Neglecta" "Eucalyptus_Urnigera"
## [33] "Fagus_Sylvatica" "Ginkgo_Biloba"
## [35] "Ilex_Aquifolium" "Ilex_Cornuta"
## [37] "Liquidambar_Styraciflua" "Liriodendron_Tulipifera"
## [39] "Lithocarpus_Cleistocarpus" "Lithocarpus_Edulis"
## [41] "Magnolia_Heptapeta" "Magnolia_Salicifolia"
## [43] "Morus_Nigra" "Olea_Europaea"
## [45] "Phildelphus" "Populus_Adenopoda"
## [47] "Populus_Grandidentata" "Populus_Nigra"
## [49] "Prunus_Avium" "Prunus_X_Shmittii"
## [51] "Pterocarya_Stenoptera" "Quercus_Afares"
## [53] "Quercus_Agrifolia" "Quercus_Alnifolia"
## [55] "Quercus_Brantii" "Quercus_Canariensis"
## [57] "Quercus_Castaneifolia" "Quercus_Cerris"
## [59] "Quercus_Chrysolepis" "Quercus_Coccifera"
## [61] "Quercus_Coccinea" "Quercus_Crassifolia"
## [63] "Quercus_Crassipes" "Quercus_Dolicholepis"
## [65] "Quercus_Ellipsoidalis" "Quercus_Greggii"
## [67] "Quercus_Hartwissiana" "Quercus_Ilex"
## [69] "Quercus_Imbricaria" "Quercus_Infectoria_sub"
## [71] "Quercus_Kewensis" "Quercus_Nigra"
## [73] "Quercus_Palustris" "Quercus_Phellos"
## [75] "Quercus_Phillyraeoides" "Quercus_Pontica"
## [77] "Quercus_Pubescens" "Quercus_Pyrenaica"
## [79] "Quercus_Rhysophylla" "Quercus_Rubra"
## [81] "Quercus_Semecarpifolia" "Quercus_Shumardii"
## [83] "Quercus_Suber" "Quercus_Texana"
## [85] "Quercus_Trojana" "Quercus_Variabilis"
## [87] "Quercus_Vulcanica" "Quercus_x_Hispanica"
## [89] "Quercus_x_Turneri" "Rhododendron_x_Russellianum"
## [91] "Salix_Fragilis" "Salix_Intergra"
## [93] "Sorbus_Aria" "Tilia_Oliveri"
## [95] "Tilia_Platyphyllos" "Tilia_Tomentosa"
## [97] "Ulmus_Bergmanniana" "Viburnum_Tinus"
## [99] "Viburnum_x_Rhytidophylloides" "Zelkova_Serrata"
There is one subdirectory for each plant species.
data_dir <- paste(working_dir, lf, sep = "")
data_dir[1:5]
## [1] "~/Documents/R/aroundrblog/datascienceplus/leaves/100 leaves plant species/data/Acer_Campestre"
## [2] "~/Documents/R/aroundrblog/datascienceplus/leaves/100 leaves plant species/data/Acer_Capillipes"
## [3] "~/Documents/R/aroundrblog/datascienceplus/leaves/100 leaves plant species/data/Acer_Circinatum"
## [4] "~/Documents/R/aroundrblog/datascienceplus/leaves/100 leaves plant species/data/Acer_Mono"
## [5] "~/Documents/R/aroundrblog/datascienceplus/leaves/100 leaves plant species/data/Acer_Opalus"
Loading the image files as 100 pixels JPEG. I split the overall images set in the training and test pools.
files_in_dir <- sapply(data_dir, function(x) {length(list.files(x))})
files_in_dir
## ~/Documents/R/aroundrblog/datascienceplus/leaves/100 leaves plant species/data/Acer_Campestre
## 16
## ~/Documents/R/aroundrblog/datascienceplus/leaves/100 leaves plant species/data/Acer_Capillipes
## 16
## ~/Documents/R/aroundrblog/datascienceplus/leaves/100 leaves plant species/data/Acer_Circinatum
## 16
## ~/Documents/R/aroundrblog/datascienceplus/leaves/100 leaves plant species/data/Acer_Mono
## 16
## ~/Documents/R/aroundrblog/datascienceplus/leaves/100 leaves plant species/data/Acer_Opalus
## 16
## ~/Documents/R/aroundrblog/datascienceplus/leaves/100 leaves plant species/data/Acer_Palmatum
## 16
## ~/Documents/R/aroundrblog/datascienceplus/leaves/100 leaves plant species/data/Acer_Pictum
## 16
## ~/Documents/R/aroundrblog/datascienceplus/leaves/100 leaves plant species/data/Acer_Platanoids
## 16
## ~/Documents/R/aroundrblog/datascienceplus/leaves/100 leaves plant species/data/Acer_Rubrum
## 16
## ~/Documents/R/aroundrblog/datascienceplus/leaves/100 leaves plant species/data/Acer_Rufinerve
## 16
## ~/Documents/R/aroundrblog/datascienceplus/leaves/100 leaves plant species/data/Acer_Saccharinum
## 16
## ~/Documents/R/aroundrblog/datascienceplus/leaves/100 leaves plant species/data/Alnus_Cordata
## 16
## ~/Documents/R/aroundrblog/datascienceplus/leaves/100 leaves plant species/data/Alnus_Maximowiczii
## 16
## ~/Documents/R/aroundrblog/datascienceplus/leaves/100 leaves plant species/data/Alnus_Rubra
## 16
## ~/Documents/R/aroundrblog/datascienceplus/leaves/100 leaves plant species/data/Alnus_Sieboldiana
## 16
## ~/Documents/R/aroundrblog/datascienceplus/leaves/100 leaves plant species/data/Alnus_Viridis
## 16
## ~/Documents/R/aroundrblog/datascienceplus/leaves/100 leaves plant species/data/Arundinaria_Simonii
## 16
## ~/Documents/R/aroundrblog/datascienceplus/leaves/100 leaves plant species/data/Betula_Austrosinensis
## 16
## ~/Documents/R/aroundrblog/datascienceplus/leaves/100 leaves plant species/data/Betula_Pendula
## 16
## ~/Documents/R/aroundrblog/datascienceplus/leaves/100 leaves plant species/data/Callicarpa_Bodinieri
## 16
## ~/Documents/R/aroundrblog/datascienceplus/leaves/100 leaves plant species/data/Castanea_Sativa
## 16
## ~/Documents/R/aroundrblog/datascienceplus/leaves/100 leaves plant species/data/Celtis_Koraiensis
## 16
## ~/Documents/R/aroundrblog/datascienceplus/leaves/100 leaves plant species/data/Cercis_Siliquastrum
## 16
## ~/Documents/R/aroundrblog/datascienceplus/leaves/100 leaves plant species/data/Cornus_Chinensis
## 16
## ~/Documents/R/aroundrblog/datascienceplus/leaves/100 leaves plant species/data/Cornus_Controversa
## 16
## ~/Documents/R/aroundrblog/datascienceplus/leaves/100 leaves plant species/data/Cornus_Macrophylla
## 16
## ~/Documents/R/aroundrblog/datascienceplus/leaves/100 leaves plant species/data/Cotinus_Coggygria
## 16
## ~/Documents/R/aroundrblog/datascienceplus/leaves/100 leaves plant species/data/Crataegus_Monogyna
## 16
## ~/Documents/R/aroundrblog/datascienceplus/leaves/100 leaves plant species/data/Cytisus_Battandieri
## 16
## ~/Documents/R/aroundrblog/datascienceplus/leaves/100 leaves plant species/data/Eucalyptus_Glaucescens
## 16
## ~/Documents/R/aroundrblog/datascienceplus/leaves/100 leaves plant species/data/Eucalyptus_Neglecta
## 16
## ~/Documents/R/aroundrblog/datascienceplus/leaves/100 leaves plant species/data/Eucalyptus_Urnigera
## 16
## ~/Documents/R/aroundrblog/datascienceplus/leaves/100 leaves plant species/data/Fagus_Sylvatica
## 16
## ~/Documents/R/aroundrblog/datascienceplus/leaves/100 leaves plant species/data/Ginkgo_Biloba
## 16
## ~/Documents/R/aroundrblog/datascienceplus/leaves/100 leaves plant species/data/Ilex_Aquifolium
## 16
## ~/Documents/R/aroundrblog/datascienceplus/leaves/100 leaves plant species/data/Ilex_Cornuta
## 16
## ~/Documents/R/aroundrblog/datascienceplus/leaves/100 leaves plant species/data/Liquidambar_Styraciflua
## 16
## ~/Documents/R/aroundrblog/datascienceplus/leaves/100 leaves plant species/data/Liriodendron_Tulipifera
## 16
## ~/Documents/R/aroundrblog/datascienceplus/leaves/100 leaves plant species/data/Lithocarpus_Cleistocarpus
## 16
## ~/Documents/R/aroundrblog/datascienceplus/leaves/100 leaves plant species/data/Lithocarpus_Edulis
## 16
## ~/Documents/R/aroundrblog/datascienceplus/leaves/100 leaves plant species/data/Magnolia_Heptapeta
## 16
## ~/Documents/R/aroundrblog/datascienceplus/leaves/100 leaves plant species/data/Magnolia_Salicifolia
## 16
## ~/Documents/R/aroundrblog/datascienceplus/leaves/100 leaves plant species/data/Morus_Nigra
## 16
## ~/Documents/R/aroundrblog/datascienceplus/leaves/100 leaves plant species/data/Olea_Europaea
## 16
## ~/Documents/R/aroundrblog/datascienceplus/leaves/100 leaves plant species/data/Phildelphus
## 16
## ~/Documents/R/aroundrblog/datascienceplus/leaves/100 leaves plant species/data/Populus_Adenopoda
## 16
## ~/Documents/R/aroundrblog/datascienceplus/leaves/100 leaves plant species/data/Populus_Grandidentata
## 16
## ~/Documents/R/aroundrblog/datascienceplus/leaves/100 leaves plant species/data/Populus_Nigra
## 16
## ~/Documents/R/aroundrblog/datascienceplus/leaves/100 leaves plant species/data/Prunus_Avium
## 16
## ~/Documents/R/aroundrblog/datascienceplus/leaves/100 leaves plant species/data/Prunus_X_Shmittii
## 16
## ~/Documents/R/aroundrblog/datascienceplus/leaves/100 leaves plant species/data/Pterocarya_Stenoptera
## 16
## ~/Documents/R/aroundrblog/datascienceplus/leaves/100 leaves plant species/data/Quercus_Afares
## 16
## ~/Documents/R/aroundrblog/datascienceplus/leaves/100 leaves plant species/data/Quercus_Agrifolia
## 16
## ~/Documents/R/aroundrblog/datascienceplus/leaves/100 leaves plant species/data/Quercus_Alnifolia
## 16
## ~/Documents/R/aroundrblog/datascienceplus/leaves/100 leaves plant species/data/Quercus_Brantii
## 16
## ~/Documents/R/aroundrblog/datascienceplus/leaves/100 leaves plant species/data/Quercus_Canariensis
## 16
## ~/Documents/R/aroundrblog/datascienceplus/leaves/100 leaves plant species/data/Quercus_Castaneifolia
## 16
## ~/Documents/R/aroundrblog/datascienceplus/leaves/100 leaves plant species/data/Quercus_Cerris
## 16
## ~/Documents/R/aroundrblog/datascienceplus/leaves/100 leaves plant species/data/Quercus_Chrysolepis
## 16
## ~/Documents/R/aroundrblog/datascienceplus/leaves/100 leaves plant species/data/Quercus_Coccifera
## 16
## ~/Documents/R/aroundrblog/datascienceplus/leaves/100 leaves plant species/data/Quercus_Coccinea
## 16
## ~/Documents/R/aroundrblog/datascienceplus/leaves/100 leaves plant species/data/Quercus_Crassifolia
## 16
## ~/Documents/R/aroundrblog/datascienceplus/leaves/100 leaves plant species/data/Quercus_Crassipes
## 16
## ~/Documents/R/aroundrblog/datascienceplus/leaves/100 leaves plant species/data/Quercus_Dolicholepis
## 16
## ~/Documents/R/aroundrblog/datascienceplus/leaves/100 leaves plant species/data/Quercus_Ellipsoidalis
## 16
## ~/Documents/R/aroundrblog/datascienceplus/leaves/100 leaves plant species/data/Quercus_Greggii
## 16
## ~/Documents/R/aroundrblog/datascienceplus/leaves/100 leaves plant species/data/Quercus_Hartwissiana
## 16
## ~/Documents/R/aroundrblog/datascienceplus/leaves/100 leaves plant species/data/Quercus_Ilex
## 16
## ~/Documents/R/aroundrblog/datascienceplus/leaves/100 leaves plant species/data/Quercus_Imbricaria
## 16
## ~/Documents/R/aroundrblog/datascienceplus/leaves/100 leaves plant species/data/Quercus_Infectoria_sub
## 16
## ~/Documents/R/aroundrblog/datascienceplus/leaves/100 leaves plant species/data/Quercus_Kewensis
## 16
## ~/Documents/R/aroundrblog/datascienceplus/leaves/100 leaves plant species/data/Quercus_Nigra
## 16
## ~/Documents/R/aroundrblog/datascienceplus/leaves/100 leaves plant species/data/Quercus_Palustris
## 16
## ~/Documents/R/aroundrblog/datascienceplus/leaves/100 leaves plant species/data/Quercus_Phellos
## 16
## ~/Documents/R/aroundrblog/datascienceplus/leaves/100 leaves plant species/data/Quercus_Phillyraeoides
## 16
## ~/Documents/R/aroundrblog/datascienceplus/leaves/100 leaves plant species/data/Quercus_Pontica
## 16
## ~/Documents/R/aroundrblog/datascienceplus/leaves/100 leaves plant species/data/Quercus_Pubescens
## 16
## ~/Documents/R/aroundrblog/datascienceplus/leaves/100 leaves plant species/data/Quercus_Pyrenaica
## 16
## ~/Documents/R/aroundrblog/datascienceplus/leaves/100 leaves plant species/data/Quercus_Rhysophylla
## 16
## ~/Documents/R/aroundrblog/datascienceplus/leaves/100 leaves plant species/data/Quercus_Rubra
## 16
## ~/Documents/R/aroundrblog/datascienceplus/leaves/100 leaves plant species/data/Quercus_Semecarpifolia
## 16
## ~/Documents/R/aroundrblog/datascienceplus/leaves/100 leaves plant species/data/Quercus_Shumardii
## 16
## ~/Documents/R/aroundrblog/datascienceplus/leaves/100 leaves plant species/data/Quercus_Suber
## 16
## ~/Documents/R/aroundrblog/datascienceplus/leaves/100 leaves plant species/data/Quercus_Texana
## 16
## ~/Documents/R/aroundrblog/datascienceplus/leaves/100 leaves plant species/data/Quercus_Trojana
## 16
## ~/Documents/R/aroundrblog/datascienceplus/leaves/100 leaves plant species/data/Quercus_Variabilis
## 16
## ~/Documents/R/aroundrblog/datascienceplus/leaves/100 leaves plant species/data/Quercus_Vulcanica
## 16
## ~/Documents/R/aroundrblog/datascienceplus/leaves/100 leaves plant species/data/Quercus_x_Hispanica
## 16
## ~/Documents/R/aroundrblog/datascienceplus/leaves/100 leaves plant species/data/Quercus_x_Turneri
## 16
## ~/Documents/R/aroundrblog/datascienceplus/leaves/100 leaves plant species/data/Rhododendron_x_Russellianum
## 16
## ~/Documents/R/aroundrblog/datascienceplus/leaves/100 leaves plant species/data/Salix_Fragilis
## 16
## ~/Documents/R/aroundrblog/datascienceplus/leaves/100 leaves plant species/data/Salix_Intergra
## 16
## ~/Documents/R/aroundrblog/datascienceplus/leaves/100 leaves plant species/data/Sorbus_Aria
## 16
## ~/Documents/R/aroundrblog/datascienceplus/leaves/100 leaves plant species/data/Tilia_Oliveri
## 16
## ~/Documents/R/aroundrblog/datascienceplus/leaves/100 leaves plant species/data/Tilia_Platyphyllos
## 16
## ~/Documents/R/aroundrblog/datascienceplus/leaves/100 leaves plant species/data/Tilia_Tomentosa
## 16
## ~/Documents/R/aroundrblog/datascienceplus/leaves/100 leaves plant species/data/Ulmus_Bergmanniana
## 16
## ~/Documents/R/aroundrblog/datascienceplus/leaves/100 leaves plant species/data/Viburnum_Tinus
## 16
## ~/Documents/R/aroundrblog/datascienceplus/leaves/100 leaves plant species/data/Viburnum_x_Rhytidophylloides
## 16
## ~/Documents/R/aroundrblog/datascienceplus/leaves/100 leaves plant species/data/Zelkova_Serrata
## 16
length(files_in_dir)
## [1] 100
Dumps above tells us there are 16 images for each plant species and there are available 100 plant species folders where images are stored. So, we have available a few images for each plant species and about that I will talk in the Conclusions paragraph.
set.seed(1023)
img_dim <- 100
train_size <- 0.8
x_train <- list()
y_train <- list()
x_test <- list()
y_test <- list()
imgname_test <- list()
# considering all directories where images are located
for (i in 1:length(data_dir)) {
# changing directory
setwd(data_dir[[i]])
# list the files of a subfolder where one species is located
fl <- list.files()
l <- length(fl)
# sammpling to determine the training set
s <- sample(1:l, train_size*l)
# training and test names list
train_x_name <- fl[s]
test_x_name <- fl[setdiff(1:l, s)]
# loading training images set with names associated to
img_x_l <- lapply(train_x_name, readJPEG)
img_x_l <- lapply(img_x_l, function(x) {resize(x, w=img_dim, h=img_dim)})
# updating the training x data
x_train <- c(x_train, img_x_l)
# the y train data represent an identifier associated to the
# subfolder in processing, hence associated to one species
y_train <- c(y_train, rep(i-1, length(s))) # image folders indexes need to be zero based
# loading the test images set with names associated to
img_x_l <- lapply(test_x_name, readJPEG)
img_x_l <- lapply(img_x_l, function(x) {resize(x, w=img_dim, h=img_dim)})
# updating the test x data and y data in a similar fashion as the training data
x_test <- c(x_test, img_x_l)
y_test <- c(y_test, rep(i-1, l-length(s))) # image folders indexes need to be zero based
}
Let us have a look at how the plant lead images look like.
image(x_train[[1]])
image(x_train[[20]])
image(x_train[[50]])
The train, validation and test datasets are then determined.
leaves_img <- list()
leaves_img$train <- list()
leaves_img$test <- list()
leaves_img$train$x <- array(0, c(length(x_train), img_dim, img_dim))
for (k in 1:length(x_train)) {
leaves_img$train$x[k,,] <- x_train[[k]]
}
leaves_img$train$y <- unlist(y_train)
leaves_img$test$x <- array(0, c(length(x_test), img_dim, img_dim))
dim(leaves_img$test$x)
## [1] 400 100 100
for (k in 1:length(x_test)) {
leaves_img$test$x[k,,] <- x_test[[k]]
}
leaves_img$test$y <- unlist(y_test)
x_train <- leaves_img$train$x
y_train <- leaves_img$train$y
x_test <- leaves_img$test$x
y_test <- leaves_img$test$y
x_train <- array_reshape(x_train, c(nrow(x_train), c(img_dim,img_dim,1)))
y_train <- to_categorical(y_train, img_list_len)
x_test <- array_reshape(x_test, c(nrow(x_test), c(img_dim,img_dim,1)))
y_test <- to_categorical(y_test, img_list_len)
val_len <- nrow(y_test)/2
val_seq <- seq(from=1, to=nrow(y_test), by=2)
x_val <- x_test[val_seq,,,,drop=FALSE]
y_val <- y_test[val_seq,,drop=FALSE]
x_test <- x_test[-c(val_seq),,,,drop=FALSE]
y_test <- y_test[-c(val_seq),,drop=FALSE]
At the end, we have split our 1600 images set in training, validation and test datasets having the following number of records.
nrow(y_train)
## [1] 1200
nrow(y_val)
## [1] 200
nrow(y_test)
## [1] 200
A Keras model based is the defined by taking advantage of a first sequential model followed by a pool of convolutional layer intermixed with pooling and dropout layers.
model <- keras_model_sequential() %>%
layer_conv_2d(filters = 32, kernel_size = c(3,3), activation = 'relu',
input_shape = c(img_dim, img_dim, 1)) %>%
layer_max_pooling_2d(pool_size = c(2,2)) %>%
layer_dropout(rate = 0.5) %>%
layer_conv_2d(filters = 64, kernel_size = c(3,3), activation = 'relu') %>%
layer_max_pooling_2d(pool_size = c(2,2)) %>%
layer_dropout(rate = 0.5) %>%
layer_conv_2d(filters = 128, kernel_size = c(3,3), activation = 'relu') %>%
layer_max_pooling_2d(pool_size = c(2,2)) %>%
layer_dropout(rate = 0.5) %>%
layer_conv_2d(filters = 128, kernel_size = c(3,3), activation = 'relu') %>%
layer_max_pooling_2d(pool_size = c(2,2)) %>%
layer_dropout(rate = 0.5)
model <- model %>% layer_flatten() %>%
layer_dense(units = 256, activation = 'relu') %>%
layer_dropout(rate = 0.5) %>%
layer_dense(units = img_list_len, activation = 'softmax')
summary(model)
## ___________________________________________________________________________
## Layer (type) Output Shape Param #
## ===========================================================================
## conv2d_1 (Conv2D) (None, 98, 98, 32) 320
## ___________________________________________________________________________
## max_pooling2d_1 (MaxPooling2D) (None, 49, 49, 32) 0
## ___________________________________________________________________________
## dropout_1 (Dropout) (None, 49, 49, 32) 0
## ___________________________________________________________________________
## conv2d_2 (Conv2D) (None, 47, 47, 64) 18496
## ___________________________________________________________________________
## max_pooling2d_2 (MaxPooling2D) (None, 23, 23, 64) 0
## ___________________________________________________________________________
## dropout_2 (Dropout) (None, 23, 23, 64) 0
## ___________________________________________________________________________
## conv2d_3 (Conv2D) (None, 21, 21, 128) 73856
## ___________________________________________________________________________
## max_pooling2d_3 (MaxPooling2D) (None, 10, 10, 128) 0
## ___________________________________________________________________________
## dropout_3 (Dropout) (None, 10, 10, 128) 0
## ___________________________________________________________________________
## conv2d_4 (Conv2D) (None, 8, 8, 128) 147584
## ___________________________________________________________________________
## max_pooling2d_4 (MaxPooling2D) (None, 4, 4, 128) 0
## ___________________________________________________________________________
## dropout_4 (Dropout) (None, 4, 4, 128) 0
## ___________________________________________________________________________
## flatten_1 (Flatten) (None, 2048) 0
## ___________________________________________________________________________
## dense_1 (Dense) (None, 256) 524544
## ___________________________________________________________________________
## dropout_5 (Dropout) (None, 256) 0
## ___________________________________________________________________________
## dense_2 (Dense) (None, 100) 25700
## ===========================================================================
## Total params: 790,500
## Trainable params: 790,500
## Non-trainable params: 0
## ___________________________________________________________________________
model %>% compile(
loss = 'categorical_crossentropy',
optimizer = optimizer_adam(lr=1e-3),
metrics = c('accuracy')
)
Finally, we fit the deep learning model.
# https://github.com/keras-team/keras/issues/4298
#
# validation_split does now allow for val_acc different from zero
history <- model %>% fit(
x_train,
y_train,
epochs = 40,
batch_size = 10,
validation_data = list(x_val, y_val),
shuffle = TRUE
)
plot(history)
We evaluate our model against the training dataset.
model %>% evaluate(x_train, y_train)
## $loss
## [1] 0.5067297
##
## $acc
## [1] 0.9183333
We then evaluate our model accuracy against the test dataset, which is the figure of interest.
model %>% evaluate(x_test, y_test)
## $loss
## [1] 0.9157
##
## $acc
## [1] 0.77
It is further interesting to have a look at the prediction and compare them with the test dataset values.
pred_res <- model %>% predict_classes(x_test)
head(pred_res) # zero based species identifiers
## [1] 0 8 1 1 0 2
test_image_ids <- leaves_img$test$y[-c(val_seq)]
comparison <- data.frame(pred_res, test_image_ids)
colnames(comparison) <- c("prediction", "actual")
head(comparison,30) # zero based species identifiers
## prediction actual
## 1 0 0
## 2 8 0
## 3 1 1
## 4 1 1
## 5 0 2
## 6 2 2
## 7 3 3
## 8 3 3
## 9 4 4
## 10 4 4
## 11 5 5
## 12 5 5
## 13 6 6
## 14 6 6
## 15 7 7
## 16 7 7
## 17 8 8
## 18 8 8
## 19 9 9
## 20 9 9
## 21 10 10
## 22 10 10
## 23 11 11
## 24 11 11
## 25 12 12
## 26 12 12
## 27 13 13
## 28 13 13
## 29 14 14
## 30 14 14
Let us compute accuracy by species.
accuracy_by_species <- comparison %>% group_by(actual) %>% summarise(round(100*sum(actual == prediction)/n(), 2))
accuracy_by_species_df <- as.data.frame(accuracy_by_species)
colnames(accuracy_by_species_df) <- c("species_id", "accuracy")
species_names <- lf[accuracy_by_species_df$species_id+1]
test_result <- data.frame(species=species_names, accuracy=accuracy_by_species_df$accuracy)
test_result
## species accuracy
## 1 Acer_Campestre 50
## 2 Acer_Capillipes 100
## 3 Acer_Circinatum 50
## 4 Acer_Mono 100
## 5 Acer_Opalus 100
## 6 Acer_Palmatum 100
## 7 Acer_Pictum 100
## 8 Acer_Platanoids 100
## 9 Acer_Rubrum 100
## 10 Acer_Rufinerve 100
## 11 Acer_Saccharinum 100
## 12 Alnus_Cordata 100
## 13 Alnus_Maximowiczii 100
## 14 Alnus_Rubra 100
## 15 Alnus_Sieboldiana 100
## 16 Alnus_Viridis 50
## 17 Arundinaria_Simonii 100
## 18 Betula_Austrosinensis 50
## 19 Betula_Pendula 100
## 20 Callicarpa_Bodinieri 100
## 21 Castanea_Sativa 100
## 22 Celtis_Koraiensis 0
## 23 Cercis_Siliquastrum 100
## 24 Cornus_Chinensis 50
## 25 Cornus_Controversa 0
## 26 Cornus_Macrophylla 0
## 27 Cotinus_Coggygria 100
## 28 Crataegus_Monogyna 100
## 29 Cytisus_Battandieri 100
## 30 Eucalyptus_Glaucescens 100
## 31 Eucalyptus_Neglecta 0
## 32 Eucalyptus_Urnigera 100
## 33 Fagus_Sylvatica 50
## 34 Ginkgo_Biloba 100
## 35 Ilex_Aquifolium 50
## 36 Ilex_Cornuta 100
## 37 Liquidambar_Styraciflua 100
## 38 Liriodendron_Tulipifera 100
## 39 Lithocarpus_Cleistocarpus 50
## 40 Lithocarpus_Edulis 100
## 41 Magnolia_Heptapeta 100
## 42 Magnolia_Salicifolia 50
## 43 Morus_Nigra 100
## 44 Olea_Europaea 100
## 45 Phildelphus 50
## 46 Populus_Adenopoda 50
## 47 Populus_Grandidentata 100
## 48 Populus_Nigra 100
## 49 Prunus_Avium 100
## 50 Prunus_X_Shmittii 100
## 51 Pterocarya_Stenoptera 50
## 52 Quercus_Afares 50
## 53 Quercus_Agrifolia 100
## 54 Quercus_Alnifolia 50
## 55 Quercus_Brantii 100
## 56 Quercus_Canariensis 100
## 57 Quercus_Castaneifolia 100
## 58 Quercus_Cerris 50
## 59 Quercus_Chrysolepis 0
## 60 Quercus_Coccifera 100
## 61 Quercus_Coccinea 100
## 62 Quercus_Crassifolia 50
## 63 Quercus_Crassipes 100
## 64 Quercus_Dolicholepis 0
## 65 Quercus_Ellipsoidalis 50
## 66 Quercus_Greggii 100
## 67 Quercus_Hartwissiana 50
## 68 Quercus_Ilex 100
## 69 Quercus_Imbricaria 50
## 70 Quercus_Infectoria_sub 100
## 71 Quercus_Kewensis 50
## 72 Quercus_Nigra 100
## 73 Quercus_Palustris 100
## 74 Quercus_Phellos 100
## 75 Quercus_Phillyraeoides 50
## 76 Quercus_Pontica 100
## 77 Quercus_Pubescens 100
## 78 Quercus_Pyrenaica 100
## 79 Quercus_Rhysophylla 0
## 80 Quercus_Rubra 100
## 81 Quercus_Semecarpifolia 100
## 82 Quercus_Shumardii 100
## 83 Quercus_Suber 50
## 84 Quercus_Texana 50
## 85 Quercus_Trojana 100
## 86 Quercus_Variabilis 100
## 87 Quercus_Vulcanica 50
## 88 Quercus_x_Hispanica 50
## 89 Quercus_x_Turneri 50
## 90 Rhododendron_x_Russellianum 100
## 91 Salix_Fragilis 50
## 92 Salix_Intergra 50
## 93 Sorbus_Aria 100
## 94 Tilia_Oliveri 100
## 95 Tilia_Platyphyllos 100
## 96 Tilia_Tomentosa 50
## 97 Ulmus_Bergmanniana 100
## 98 Viburnum_Tinus 50
## 99 Viburnum_x_Rhytidophylloides 0
## 100 Zelkova_Serrata 100
table(test_result$accuracy)
##
## 0 50 100
## 8 30 62
Let us have a look at wrong predictions.
not_matching <- comparison[(comparison$prediction != comparison$actual),]
nrow(not_matching)
## [1] 46
nrow(not_matching)/nrow(comparison)
## [1] 0.23
head(not_matching, 10) # zero based species identifiers
## prediction actual
## 2 8 0
## 5 0 2
## 32 11 15
## 36 24 17
## 43 26 21
## 44 26 21
## 47 41 23
## 49 38 24
## 50 26 24
## 51 39 25
Let us show how some of wrong predictions missed the actual leaf plant species. We compare images side-by-side, actual to the left and wrong prediction to the right.
for (k in 1:10) {
i <- not_matching[k, 1] + 1 # image folders indexes were zero based
j <- not_matching[k, 2] + 1
predicted_image <- x_test[i,,,]
predicted_image_name <- lf[i]
actual_image <- x_test[j,,,]
actual_image_name <- lf[j]
top_label <- paste(actual_image_name, predicted_image_name, sep= " ")
img1 <- rasterGrob(as.raster(actual_image), interpolate = FALSE)
img2 <- rasterGrob(as.raster(predicted_image), interpolate = FALSE)
grid.arrange(img1, img2, ncol = 2, top = top_label)
}
Conclusions
We show a deep learning model able to classify the images as provided within the 100 plant leaf datasets zip file. Achieved accuracy is good, however it may be improved by the data augmentation technique capable to increase the number of available images for each plant species. That may help in improving the training phase, and, as a result, achieving higher accuracy.
References
[1] Charles Mallah, James Cope and James Orwell, "Plant leaf classification using probabilistic integration of shape, texture and margin features" [https://www.researchgate.net/publication/266632357_Plant_Leaf_Classification_using_Probabilistic_Integration_of_Shape_Texture_and_Margin_Features]
[2] 100 Plant Leaf Dataset [https://archive.ics.uci.edu/ml/datasets/One-hundred+plant+species+leaves+data+set]
No comments:
Post a Comment
Note: Only a member of this blog may post a comment.