Enabling Image Recognition on Constrained Devices Using Neural Network Pruning and a CycleGAN

The potential to use AI/ML appears in all domains. There is also an increasing interest in using the technology on constrained edge devices. This study explores two possible ways forward for image recognition.

This study in an underpass in Helsingborg happened by chance. Two students, Simon and Ludwig, sent me a mail and asked if they could do a computer science project (7.5 credits at Lund University) within the SMILE program. I agreed to find something interesting. Their two friends August and Daniel heard about it, and we decided to design two related student projects that could be done in parallel. Instead of the automotive context of SMILE, we identified an application that fit the newly started AIQ Meta-Testbed project in Helsinborg. Both projects were highly successful and resulted in a joint paper.

Classifying activity in an underpass

A previous research project in Helsingborg installed an Axis network camera in an underpass to detect illegal graffiti and doodling as early as possible. Graffiti removal is a high cost for many municipalities, and early removal is preferred to tackle the issue. That project was over, but we got permission to use images from the camera to train classifiers and test them in various ways. We trained a Deep Neural Network (DNN) for multi-class classification, i.e., detecting the presence of pedestrians, dog walkers, and bicyclists.

The four categories of data that our DNNs use in the classification application.

Smart cameras can be examples of edge devices with AI capabilities. In such constrained devices, compute, memory, bandwidth, and the like are all limited resources. In this project, we explored some ways to support smart cameras in the future, i.e., neural network pruning and using a CycleGAN to transform out-of-distribution images.

Robustness is an essential quality attribute in image recognition. For a trained DNN, robustness means handling perturbations or input data that doesn’t resemble the training data. This out-of-distribution input is the primary topic studied in previous work in SMILE. Inspired by automotive engineering, we specify the Operational Design Domain (ODD) of our classification model to cover daytime conditions. Then we hypothesize that we could use a CycleGAN to transform out-of-distribution input (such as night images) to the ODD of our application.

Small footprint? Prune the network!

We relied on a standard VGG16 architecture as a baseline for our image classification experiments. VGG16 has 16 layers with trainable parameters, including 13 convolutional layers. There are in total 134 million parameters to train. In our application context, we only had 6,000 annotated images from the underpass. The first research question in this paper involved trying to prune this DNN while maintaining accurate results. Our results show that a DNN as small as 1% of the original VGG16 architecture performed comparably. Small networks might very well be enough for specialized classification applications in constrained edge devices.

CycleGAN that OOD, bring it to the ODD

We used an open implementation of the CycleGAN architecture proposed by Zhu et al. (2017). The architecture consists of two discriminator models and two generator models. We trained the CycleGAN using a dataset containing 1,128 images from the underpass, containing an equal share of images from the daytime domain and the nighttime domain. Using this CycleGAN we could transform out-of-distribution nighttime images to the daytime ODD. And yes – the classification accuracy appears to better for transformed images than the original.

The first image on this post shows another speculative use of CycleGANs. As the camera was vandalized during the project (the camera dome was spray painted), we tried training a CycleGAN to transform spray-disrupted images to the standard ODD. We didn’t have a large enough dataset to make it great, but we found the initial results promising.

To the left, out-of-distribution images from the nighttime domain. To the right, images CycleGAN-transformed to the daytime ODD. The classifier performs better on transformed images.

Implications for Research and Practice

  • Neural network pruning can enable AI/ML on constrained devices.
  • CycleGANs can be used to bring out-of-distribution input to the operational design domain of the application.
  • CycleGANs might be used to recover image classification performance after antagonistic attacks.
August Lidfeldt, Daniel Isaksson, Ludwig Hedlund, Simon Åberg, Markus Borg, Erik Larsson. Enabling Image Recognition on Constrained Devices Using Neural Network Pruning and a CycleGAN, In Proc. of the 10th International Conference on the Internet of Things Companion, 2020. (preprint, code)

Abstract

Smart cameras are increasingly used in surveillance solutions in public spaces. Contemporary computer vision applications can be used to recognize events that require intervention by emergency services. Smart cameras can be mounted in locations where citizens feel particularly unsafe, e.g., pathways and underpasses with a history of incidents. One promising approach for smart cameras is edge AI, i.e., deploying AI technology on IoT devices. However, implementing resource-demanding technology such as image recognition using deep neural networks (DNN) on constrained devices is a substantial challenge. In this paper, we explore two approaches to reduce the need for compute in contemporary image recognition in an underpass. First, we showcase successful neural network pruning, i.e., we retain comparable classification accuracy with only 1.1% of the neurons remaining from the state-of-the-art DNN architecture. Second, we demonstrate how a CycleGAN can be used to transform out-of-distribution images to the operational design domain. We posit that both pruning and CycleGANs are promising enablers for efficient edge AI in smart cameras.