This paper presents an analysis of pre-trained models to recognize handwritten Devanagari alphabets using transfer learning for Deep Convolution Neural Network (DCNN). This research implements AlexNet, DenseNet, Vgg, and Inception ConvNet as a fixed feature extractor.
We implemented 15 epochs for each of AlexNet, DenseNet 121, DenseNet 201, Vgg 11, Vgg 16, Vgg 19, and Inception V3. Results show that Inception V3 performs better in terms of accuracy achieving 99% accuracy with average epoch time 16.3 minutes while AlexNet performs fastest with 2.2 minutes per epoch and achieving 98\% accuracy.
Challenege is that many characters of devanagari dataset are similar
Variation of one example character as how different people may write it
- Comparison of Best Accuracy
|Model||Valid Accuracy (in 1st epoch)||Best Accuracy (in 15 epochs)||Best Accuracy acheiveed in # epochs||Total Time (15 epochs)||Average Training Time per Epoch||Features Maps||Parameters (million)|
|DenseNet 121||73||89||7||80m 3s||5.3m||1024||25|
|DenseNet 201||74||90||6||113m 22s||7.6m||1920||20|
|Vgg 11||97||99||8||86m 6s||5.7m||4096||134|
|Vgg 16||97||98||3||132m 12s||8.8m||25088||138|
|Vgg 19||96||98||3||148m 57s||9.9m||25088||144|
|Inception V3||99||99||1||244m 36s||16.3m||2048||25|
The results are explained regarding accuracy achieved in the first epoch, the best accuracy achieved in 15 epochs, epoch number in which we achieved the best accuracy, the total time taken by 15 epochs, average epoch time, and the number of features required by the model architecture.
Results show that Inception outperformed for our dataset with 99% accuracy in the first epoch with the average time 16.3 minutes due to regularization imposed by highest number of layers and smaller convolution filter sizes. However, the computational cost of Inception is lower than second best model Vgg11 which shows 99% accuracy in 45.6 minutes (time taken by eight epochs with average 5.7 minutes per epoch) as compared to one epoch of Inception in 16.3 minutes.
The DenseNet model performed worst due to its architectural structure wherein the model feeds the connection between each layer to its subsequent layer. Since our dataset size is much smaller in comparison redundancy required for subsequent layers. AlexNet outperformed concerning computational cost with 98% accuracy in 6.6 minutes with three epochs.