Convolutional neural network (CNN) is one of the deep neural network architectures used for image recognition. In contrast to the traditional machine learning methods that require humans to manually specify the features of objects such as color, size, aspect ratio, and so on, CNNs can automatically extract that feature by learning a large number of images. Thus, building an image recognition model using CNN requires only collecting a large number of images without developing new algorithms to extract features. Today, this simple and easy method is popular in many research areas. In this workshop, we will learn how to use an R package named torch to build CNN models for image recognition.
The workshop introduces fundamental concepts of one of CNN algorithms through developing a simple model to perform image recognition. By the end of the workshop, participants should be able to:
The workshop will proceed on the assumption that participants have the following basic knowledge:
for
sentence, functions)Since the workshop focuses on explaining basic knowledge of deep learning, it is not intended for those who are familiar with deep learning or who can build models with Python or other programming languages.
In this workshop, we mainly use torch (PBC 2021) and coro (Henry 2021) to perform deep learning, use jpeg package for preprocessing image data, and use tidyverse package for data summarization and visualization. To install these packages, run the following scripts.
install.packages('jpeg')
install.packages('tidyverse')
install.packages('coro')
install.packages('torch')
library('torch')
install_torch(timeout = 1200)
Then, we restart R session and load these packages.
In this workshop, we use tf_flower as an exmaple image dataset to learn how to build and how to use image recognition models. The dataset can be downloaded from TensorFlow datasets website with the following functions.
if (!file.exists('flower_photos')) {
unlink("train_photos", recursive=TRUE)
unlink("train_photos_train", recursive=TRUE)
unlink("train_photos_valid", recursive=TRUE)
tf_flowers <- 'https://storage.googleapis.com/download.tensorflow.org/example_images/flower_photos.tgz'
download.file(tf_flowers, destfile = 'flower_photos.tgz')
untar('flower_photos.tgz')
list.files('flower_photos', recursive = FALSE)
file.remove('flower_photos/LICENSE.txt')
head(list.files('flower_photos/sunflowers', recursive = FALSE))
}
In order to get an overview of the data, we first summarize the number of categories and the number of images in each category.
train_images <- list()
for (class in list.files('flower_photos', recursive = FALSE)) {
train_images[[class]] <- sort(list.files(file.path('flower_photos', class), recursive = TRUE))
}
data.frame(class = names(train_images),
n_images = sapply(train_images, length)) %>%
ggplot(aes(x = class, y = n_images)) +
geom_bar(stat = 'identity')
We can see that there are five categories, daisy, dandelion, roses, sunflowers, and tulips in this dataset, and each category contains more than 600 images.
To perform model training and validation with this dataset, first of all, we split the dataset into two subset: training subset and validation subset. To reduce training time in this workshop, we only select 20 images for training and 10 images for validaiton.
n_train_images <- 20
n_valid_images <- 10
class_labels <- c('dandelion', 'sunflowers', 'roses', 'tulips', 'daisy')
dir.create('flower_photos_train', showWarnings = FALSE)
dir.create('flower_photos_valid', showWarnings = FALSE)
for (class in names(train_images)) {
if (class %in% class_labels) {
dir.create(file.path('flower_photos_train', class), showWarnings = FALSE)
dir.create(file.path('flower_photos_valid', class), showWarnings = FALSE)
for (i in 1:length(train_images[[class]])) {
if (i <= n_train_images) {
file.copy(file.path('flower_photos', class, train_images[[class]][i]),
file.path('flower_photos_train', class, train_images[[class]][i]))
} else if (n_train_images < i && i <= n_train_images + n_valid_images) {
file.copy(file.path('flower_photos', class, train_images[[class]][i]),
file.path('flower_photos_valid', class, train_images[[class]][i]))
}
}
}
}
Then, we define pre-processes pipeline to process images during training and validation.
train_transforms <- function(img) {
img <- transform_to_tensor(img)
img <- transform_resize(img, size = c(512, 512))
img <- transform_random_resized_crop(img, size = c(224, 224))
img <- transform_color_jitter(img)
img <- transform_random_horizontal_flip(img)
img <- transform_normalize(img, mean = c(0.485, 0.456, 0.406), std = c(0.229, 0.224, 0.225))
img
}
valid_transforms <- function(img) {
img <- transform_to_tensor(img)
img <- transform_resize(img, size = c(256, 256))
img <- transform_center_crop(img, 224)
img <- transform_normalize(img, mean = c(0.485, 0.456, 0.406), std = c(0.229, 0.224, 0.225))
img
}
At the last step, we use image_folder_dataset
function to automatically collect training images from the given folder, and use dataloader
to manage datasets during the training.
dataset_train <- image_folder_dataset('flower_photos_train', transform = train_transforms)
dataset_train$classes
dataloader_train <- dataloader(dataset_train, batch_size = 2, shuffle = TRUE)
The first step of model construction with torch package is to design architecture of neural networks. Here, we design a neural network that receives an image and output length(dataset_train$classes)
values which can be considered as score for each category. Specifically, we will design 7-layers CNN, 4-layers composed of convolutional layers and pooling layers for feature extraction and 3 full connected layers for classification.
To design the network architecture, we follow the definitions of torch to create a class with initialize
and forward
functions. The initialize
function declares the components of the network. On the other hand, the forward
function defines the order in which the components are connected.
SimpleCNN <- nn_module(
"SimpleCNN",
initialize = function(n_classes) {
self$conv1 <- nn_conv2d(3, 16, 5)
self$pool1 <- nn_max_pool2d(2, 2)
self$conv2 <- nn_conv2d(16, 32, 5)
self$pool2 <- nn_max_pool2d(2, 2)
n_inputs <- (((((224 - 5 + 1) / 2) - 5 + 1) / 2) ^ 2)* 32
self$fc1 <- nn_linear(in_features = n_inputs, out_features = 512)
self$fc2 <- nn_linear(in_features = 512, out_features = 64)
self$fc3 <- nn_linear(in_features = 64, out_features = n_classes)
},
forward = function(x) {
x <- self$conv1(x)
x <- nnf_relu(x)
x <- self$pool1(x)
x <- self$conv2(x)
x <- nnf_relu(x)
x <- self$pool2(x)
# convert a matrix to a vector
x <- torch_flatten(x, start_dim = 2)
x <- self$fc1(x)
x <- nnf_relu(x)
x <- self$fc2(x)
x <- nnf_relu(x)
x <- self$fc3(x)
x
}
)
In this subsection, we will create an instance from the model architecture and assign the dataset for model training. Here is an example for creating an instance from the SimpleCNN
class.
model <- SimpleCNN(length(dataset_train$classes))
To train the model, we specify a training algorithm and a loss function in advance. Since the cross entropy loss function is commonly used for classification problems, we use this function as a loss function to train model. In addition, we will use Adam’s algorithm to optimize the model, which is one of the popular algorithms in most situations.
criterion <- nn_cross_entropy_loss()
optimizer <- optim_adam(model$parameters)
Next, we send the prepared datasets and the model to a device (CPU or GPU) for training. Here, we use for
statement to train 5 epochs with the same dataset. At each epoch, we train the model with each of the minibatches defined by dataloader
.
model$to(device = 'cpu')
model$train()
loss_train <- c()
for (epoch in 1:5) {
loss_running <- 0
n_train_samples <- 0
coro::loop(for (b in dataloader_train) {
optimizer$zero_grad()
output <- model(b$x$to(device = 'cpu'))
loss <- criterion(output, b$y$to(device = 'cpu'))
loss$backward()
optimizer$step()
loss_running <- loss_running + loss$item() * nrow(b$x)
n_train_samples <- n_train_samples + nrow(b$x)
})
loss_train <- c(loss_train, loss_running / n_train_samples)
cat(sprintf("epoch %d loss: %3f\n", epoch, loss_running / n_train_samples))
}
From the figure, we can see that the training loss decreased during training.
data.frame(epoch = 1:length(loss_train), loss = loss_train) %>%
ggplot(aes(x = epoch, y = loss)) +
geom_line()
To refine the model, training process can be performed more times. Let us train the model more 5 epochs.
model$train()
for (epoch in 6:10) {
loss_running <- 0
n_train_samples <- 0
coro::loop(for (b in dataloader_train) {
optimizer$zero_grad()
output <- model(b$x$to(device = 'cpu'))
loss <- criterion(output, b$y$to(device = 'cpu'))
loss$backward()
optimizer$step()
loss_running <- loss_running + loss$item() * nrow(b$x)
n_train_samples <- n_train_samples + nrow(b$x)
})
loss_train <- c(loss_train, loss_running / n_train_samples)
cat(sprintf("epoch %d loss: %3f\n", epoch, loss_running / n_train_samples))
}
data.frame(epoch = 1:length(loss_train), loss = loss_train) %>%
ggplot(aes(x = epoch, y = loss)) +
geom_line()
Some popular CNN architectures are implemented in torch package, and user can calls the architecture without self-definition. Here is an example to load ResNet, one of popular CNN, and to train the network. As ResNet consists of a huge amount of parameters, it take more times for training.
model <- model_resnet18(pretrained = FALSE)
num_features <- model$fc$in_features
model$fc <- nn_linear(in_features = num_features, out_features = length(dataset_train$classes))
criterion <- nn_cross_entropy_loss()
optimizer <- optim_sgd(model$parameters, lr = 0.1)
loss_train <- c()
for (epoch in 1:10) {
loss_running <- 0
n_train_samples <- 0
coro::loop(for (b in dataloader_train) {
optimizer$zero_grad()
output <- model(b$x$to(device = 'cpu'))
loss <- criterion(output, b$y$to(device = 'cpu'))
loss$backward()
optimizer$step()
loss_running <- loss_running + loss$item() * nrow(b$x)
n_train_samples <- n_train_samples + nrow(b$x)
})
loss_train <- c(loss_train, loss_running / n_train_samples)
cat(sprintf("epoch %d loss: %3f\n", epoch, loss_running / n_train_samples))
}
data.frame(epoch = 1:length(loss_train), loss = loss_train) %>%
ggplot(aes(x = epoch, y = loss)) +
geom_line()
Here we use the validation dataset to validate the model performance. The procedures for validation is the same as that for training. Thus, (i) preprocess datasets and create dataloader
, and (ii) assign the dastaloader
to the model. At the beginning of validation, we here prepare a validation dataset.
dataset_valid <- image_folder_dataset('flower_photos_valid', transform = valid_transforms)
dataset_valid$classes
dataloader_valid <- dataloader(dataset_valid, batch_size = 2)
Then, same as training steps, we use for
statement to assign the validation dataset to the model, and retrieve the prediction results. Note that, switching the model to validation mode (evaluation mode) enables to improve the calculation speed during validation.
model$eval()
y_true <- c()
y_pred <- c()
loss_valid <- 0
n_valid_samples <- 0
coro::loop(for (b in dataloader_valid) {
output <- model(b$x$to(device = 'cpu'))
output_class_id <- torch_argmax(output, dim=2)
y_true <- c(y_true, as.numeric(b$y))
y_pred <- c(y_pred, as.numeric(output_class_id))
loss <- criterion(output, b$y$to(device = 'cpu'))
loss_valid <- loss_valid + loss$item() * nrow(b$x)
n_valid_samples <- n_valid_samples + nrow(b$x)
})
loss_valid <- loss_valid / n_valid_samples
acc_valid <- sum(y_true == y_pred) / n_valid_samples
acc_valid
Then we plot a scatter chart to visualize the correlation between predicted values and the true values.
In this subsection, we show an example to inference against a new image. We load an image with jpeg package, preprocess the image with valid_transforms
pipeline, and input the image to the trained model. The output from the model is real numbers which can be converted to probability-like values.
x <- 'flower_photos/sunflowers/9410186154_465642ed35.jpg'
x <- jpeg::readJPEG(x)
x <- valid_transforms(x)
x_batch <- array(NA, dim = c(1, dim(x)))
x_batch[1,,,] <- as.array(x)
x_batch_tensor <- torch_tensor(x_batch)
output <- model(x_batch_tensor)
output <- as.numeric(output)
output
nnf_softmax(output, dim=2)
dataset_train$classes
The trained model can be saved with torch_save
function. Note that models saved with the standard save
function will be environment-dependent, and results in that you are not able to call the model in other environments (computers).
torch_save(model, 'my_model.pth')
The model can be loaded with torch_load
function from a file. Models loaded by torch_load
function can be used for inference or retraining.
mymodel <- torch_load('my_model.pth')
mymodel$eval()
x <- 'flower_photos/sunflowers/9410186154_465642ed35.jpg'
x <- jpeg::readJPEG(x)
x <- valid_transforms(x)
x_batch <- array(NA, dim = c(1, dim(x)))
x_batch[1,,,] <- as.array(x)
x_batch_tensor <- torch_tensor(x_batch)
output <- model(x_batch_tensor)
output <- as.numeric(output)
output
nnf_softmax(output, dim=2)
devtools::session_info()
#> ─ Session info ──────────────────────────────────────────────────────────────
#> hash: cinema, right-facing fist: dark skin tone, yellow square
#>
#> setting value
#> version R Under development (unstable) (2021-10-28 r81109)
#> os Ubuntu 20.04.3 LTS
#> system x86_64, linux-gnu
#> ui X11
#> language (EN)
#> collate en_US.UTF-8
#> ctype en_US.UTF-8
#> tz Etc/UTC
#> date 2021-10-31
#> pandoc 2.14.0.3 @ /usr/local/bin/ (via rmarkdown)
#>
#> ─ Packages ───────────────────────────────────────────────────────────────────
#> package * version date (UTC) lib source
#> cachem 1.0.6 2021-08-19 [2] CRAN (R 4.2.0)
#> callr 3.7.0 2021-04-20 [2] CRAN (R 4.2.0)
#> cli 3.1.0 2021-10-27 [2] CRAN (R 4.2.0)
#> crayon 1.4.2 2021-10-29 [2] CRAN (R 4.2.0)
#> desc 1.4.0 2021-09-28 [2] CRAN (R 4.2.0)
#> devtools * 2.4.2 2021-06-07 [2] CRAN (R 4.2.0)
#> digest 0.6.28 2021-09-23 [2] CRAN (R 4.2.0)
#> ellipsis 0.3.2 2021-04-29 [2] CRAN (R 4.2.0)
#> evaluate 0.14 2019-05-28 [2] CRAN (R 4.2.0)
#> fastmap 1.1.0 2021-01-25 [2] CRAN (R 4.2.0)
#> fs 1.5.0 2020-07-31 [2] CRAN (R 4.2.0)
#> glue 1.4.2 2020-08-27 [2] CRAN (R 4.2.0)
#> htmltools 0.5.2 2021-08-25 [2] CRAN (R 4.2.0)
#> knitr 1.36 2021-09-29 [2] CRAN (R 4.2.0)
#> lifecycle 1.0.1 2021-09-24 [2] CRAN (R 4.2.0)
#> magrittr 2.0.1 2020-11-17 [2] CRAN (R 4.2.0)
#> memoise 2.0.0 2021-01-26 [2] CRAN (R 4.2.0)
#> pkgbuild 1.2.0 2020-12-15 [2] CRAN (R 4.2.0)
#> pkgdown 1.6.1 2020-09-12 [2] CRAN (R 4.2.0)
#> pkgload 1.2.3 2021-10-13 [2] CRAN (R 4.2.0)
#> prettyunits 1.1.1 2020-01-24 [2] CRAN (R 4.2.0)
#> processx 3.5.2 2021-04-30 [2] CRAN (R 4.2.0)
#> ps 1.6.0 2021-02-28 [2] CRAN (R 4.2.0)
#> purrr 0.3.4 2020-04-17 [2] CRAN (R 4.2.0)
#> R6 2.5.1 2021-08-19 [2] CRAN (R 4.2.0)
#> ragg 1.2.0 2021-10-30 [2] CRAN (R 4.2.0)
#> remotes 2.4.1 2021-09-29 [2] CRAN (R 4.2.0)
#> rlang 0.4.12 2021-10-18 [2] CRAN (R 4.2.0)
#> rmarkdown 2.11 2021-09-14 [2] CRAN (R 4.2.0)
#> rprojroot 2.0.2 2020-11-15 [2] CRAN (R 4.2.0)
#> sessioninfo 1.2.0 2021-10-31 [2] CRAN (R 4.2.0)
#> stringi 1.7.5 2021-10-04 [2] CRAN (R 4.2.0)
#> stringr 1.4.0 2019-02-10 [2] CRAN (R 4.2.0)
#> systemfonts 1.0.3 2021-10-13 [2] CRAN (R 4.2.0)
#> testthat 3.1.0 2021-10-04 [2] CRAN (R 4.2.0)
#> textshaping 0.3.6 2021-10-13 [2] CRAN (R 4.2.0)
#> usethis * 2.1.3 2021-10-27 [2] CRAN (R 4.2.0)
#> withr 2.4.2 2021-04-18 [2] CRAN (R 4.2.0)
#> xfun 0.27 2021-10-18 [2] CRAN (R 4.2.0)
#> yaml 2.2.1 2020-02-01 [2] CRAN (R 4.2.0)
#>
#> [1] /tmp/RtmpvfiGT9/temp_libpath2a344ca05124
#> [2] /usr/local/lib/R/site-library
#> [3] /usr/local/lib/R/library
#>
#> ──────────────────────────────────────────────────────────────────────────────