Image Classification of CIFAR-10 Dataset with Convolutional Neural Network, Part 2: Methodology and Experiments

In the second article of the series about image classification of CIFAR-10 dataset with a Convolutional Neural Network, I'm going to discuss methods and experiments applied to classify the images.

Methodology and Experiments

This work follows the experimental approach to design a collection of CNNs with an emphasis on optimisers. All methods were based on the techniques identified in the previous article, including those available in the TensorFlow library (TensorFlow, 2022) and broader literature. This experiment proceeds into three stages: firstly, to define and train the baseline model on normalised and one-hot encoded data; secondly, to try different regularisation and optimisation techniques and retrain the model until results are better; and finally, to change the network architecture to find the optimal solution. Each such model was compared with the baseline, to ensure that the model meets the criteria identified the first article of this series.

Hardware Configuration

The experiment was conducted on 2015 laptop with 16GB of memory and a 2.2GHz Quad-Core Intel Core i7 CPU. The tools and materials used in the experiment were VS Code, Python, TensorFlow, Sci-Kit Learn (Pedregosa et al., 2011), Pandas, Numpy, Seaborn, and Matplotlib. Models were trained on a popular set of tiny images available in the CIFAR-10 dataset (Krizhevsky and Hinton, 2009). The choice of tools is justified by their open- source nature, popularity, and wide adoption in machine learning research.

Defining the Baseline Model

The baseline model in this experiment was inspired by a VGG-style ‘blocky’ architecture identified in the literature review. According to Thakkar et al. (2018) and Brownlee (2019), the VGG is a good starting point because of its modularity and easy to understand architecture. However, all experiments were limited to two VGG blocks for simplicity.

All modified architectures were compared with the baseline to determine whether applied regularisation and optimisation techniques have positive impact on the computation time and performance. The model training followed Simonyan and Zisserman’s (2015) procedure; the baseline model made of two convolutional layers to detect 32 features was composed of a 3x3 convolutional filter, each for one colour channel, and a ReLU activation function to increase accuracy (Chollet, 2019). The output of convolutional layers was passed into max-pooling layers to reduce overfitting (Abuelnaga et al., 2016). The output was flattened (Brownlee, 2019) and passed into a fully connected layer of 128 neurons to perform the final computation, followed by a dense output layer of 10 neurons, one for each class within the dataset that uses a softmax activation function to improve accuracy (Abuelnaga et al., 2016).

Experiments

The dataset has been downloaded and split into training and test sets, and their respective labels. Both sets have been normalised by vectorising them into floats and dividing by 255 in order to fit each value within a range between 0 and 1 (Chollet, 2018; Brownlee, 2019). The target values were one-hot encoded to represent labels of the samples, therefore categorical instead of binary cross-entropy was used as the loss function when compiling the model (Chollet, 2018). Zhang et al. (2019) recommended to set the learning rate to 0.0001 but it was deemed too low by Çalik and Demirci (2018) who argued to use learning rate of 0.1. However, according to Thakkar et al. (2018), a good starting point is 0.001. Each model, except those using augmented data, has been validated using 20% of the training data as recommended by Chansung (2018) and Zhang et al. (2019). The baseline model has been trained for 10 epochs using the mini-batch size of 64 samples throughout all experiments. The key focus was to test the baseline model using three different optimisers identified in the Optimasation Techniques section in the previous article; SGD, RMSProp, and Adam, in order to eliminate one that demonstrates the worst performance.

Optimasation

The baseline model with a dropout rate of 0.2 and batch normalisation passed into Keras Tuner in order to automatically find the optimal set of hyperparameters (TensorFlow, 2022). Based on the results, the number of units in the dense layer of the model was increased to 512. When training deeper networks, Géron (2019) recommended increasing the mini-batch size to a larger number, however, since the network consisted only of one VGG block, the batch size and learning rate remained intact. To further normalise each input, a batch normalisation technique has been applied just after the hidden layer of the VGG block as proposed by Géron (2019).

To further optimise the accuracy of the model, several authors recommended augmenting the training data, for example, by using techniques such as padding, zoom range, width, and height shift. As identified in the Regularisation Techniques section in the previous article, and suggested by Brownlee (2019), data augmentation has been applied to the training set to increase the number of samples in an attempt to reduce overfitting and improve accuracy. When training the model on augmented data, the number of VGG blocks remained the same and then has been increased by one block to compare two different architectures, including the addition of weight decay to optimise the model.

Regularisation

To reduce overfitting, a dropout layer has been added between dense and output layers as suggested by Thakkar et al. (2018). The dropout frequency of 0.2 has been applied (Krizhevsky and Hinton, 2010) to randomly set inputs of the layer to zero to prevent excessive overfitting (Chollet, 2018). Batch normalisation has been applied just after convolutional and deep layers to help with gradient propagation as the network becomes deeper (Chollet, 2018). Wang et al. (2017) and Chollet (2018) proposed the L2 regularisation technique to further reduce the overfitting of the model. For this reason, a learning decay scheduler has been created to decrease the rate by 10% (Çalik and Demirci, 2018) after each epoch and then in every third epoch. Additionally, the number of epochs has been increased as suggested by Chollet (2018) from 10 to an arbitrary number of 20 epochs to find the point at which the training loss reaches the minimum value.

'Bottleneck' Architecture

The next three experiments tested various ‘bottleneck' architectures. To achieve this, the baseline model was modified by adding two additional convolutional layers but each with fewer neurons. The two layers were followed by Max-Polling and Dropout layers. The architectures were tested using RMSProp and Adam optimisers, each with 8, 16, and 32 neurons to determine which variant has the best performance. The best performing model was regularised using L2 with rate of 0.001 and the model was retrained on augmented data.

Final Thoughts

I feel like conducting some experiments helped me to get grasp of the code behind the image classification of CIFAR-10 dataset, however, I'm far from being an expert. It's easy to follow what others did, but when it comes to creating something original, things can get difficult. Nevertheless, the goal of this project is to learn and understand image classification. Moreover, I want to create a model with at least 70% accuracy. The model needs to exhibit accuracy and loss curves on the graph matching as close as possible. In the next article I will find out if my experimentation worked out.

References

Abouelnaga, Y., Ali, O.S., Rady, H. and Moustafa, M., 2016. CIFAR-10: KNN-based Ensemble of Classifiers. In 2016 International Conference on Computational Science and Computational Intelligence (CSCI), pp. 1192-1195. IEEE.

Brownlee, J. (2019) How to Develop a CNN From Scratch for CIFAR-10 Photo Classification [Online] Available at: https://machinelearningmastery.com/how-to-develop-a-cnn-from- scratch-for-cifar-10-photo-classification/ [Accessed: 16 April 2022].

Çalik, R.C., and Demirci, M.F., 2018. Cifar-10 Image Classification with Convolutional Neural Networks for Embedded Systems. In 2018 IEEE/ACS 15th International Conference on Computer Systems and Applications (AICCSA), pp. 1-2, doi:10.1109/AICCSA.2018.8612873.

Géron, A. 2019. Hands-On Machine Learning with Scikit-Learn, Keras, and TensorFlow: Concepts, Tools, and Techniques to Build Intelligent Systems. O’Reilly Media, Inc.

Krizhevsky, A. and Hinton, G., 2009. Learning multiple layers of features from tiny images.

Simonyan, K. and Zisserman, A., 2014. Very Deep Convolutional Networks for Large-Scale Image Recognition. arXiv preprint arXiv:1409.1556.

Thakkar, V., Tewary, S. and Chakraborty, C., 2018. Batch Normalization in Convolutional Neural Networks - A comparative study with CIFAR-10 data. In 2018 Fifth International Conference on Emerging Applications of Information Technology (EAIT), pp. 1-5, doi:10.1109/ EAIT.2018.8470438.

TensorFlow (2022) TensorFlow Core v2.8.0. [Online] Available at: https://www.tensorflow.org/ api_docs [Accessed: 24 April 2022].

Zhang, X., Zhou, X., Lin, M. and Sun, J., 2018. Shufflenet: An extremely efficient convolutional neural network for mobile devices. In Proceedings of the IEEE conference on computer vision and pattern recognition, pp. 6848-6856.