Transfer Learning using CNN for Handwritten Devanagari Character Recognition

Published in IEEE International Conference on Advances in Information Technology (ICAIT), 2019

Recommended citation: N. Aneja and S. Aneja, "Transfer Learning using CNN for Handwritten Devanagari Character Recognition," 2019 1st International Conference on Advances in Information Technology (ICAIT), Chikmagalur, India, 2019, pp. 293-296, c.


This paper presents an analysis of pre-trained models to recognize handwritten Devanagari alphabets using transfer learning for Deep Convolution Neural Network (DCNN). This research implements AlexNet, DenseNet, Vgg, and Inception ConvNet as a fixed feature extractor.

We implemented 15 epochs for each of AlexNet, DenseNet 121, DenseNet 201, Vgg 11, Vgg 16, Vgg 19, and Inception V3. Results show that Inception V3 performs better in terms of accuracy achieving 99% accuracy with average epoch time 16.3 minutes while AlexNet performs fastest with 2.2 minutes per epoch and achieving 98\% accuracy.

Problem Statement

Devanagari dataset

Challenege is that many characters of devanagari dataset are similar

Variation of one example character as how different people may write it


  • Comparison of Best Accuracy
ModelValid Accuracy (in 1st epoch)Best Accuracy (in 15 epochs)Best Accuracy acheiveed in # epochsTotal Time (15 epochs)Average Training Time per EpochFeatures MapsParameters (million)
AlexNet9598333m 8s2.2m921660
DenseNet 1217389780m 3s5.3m102425
DenseNet 20174906113m 22s7.6m192020
Vgg 119799886m 6s5.7m4096134
Vgg 1697983132m 12s8.8m25088138
Vgg 1996983148m 57s9.9m25088144
Inception V399991244m 36s16.3m204825

The results are explained regarding accuracy achieved in the first epoch, the best accuracy achieved in 15 epochs, epoch number in which we achieved the best accuracy, the total time taken by 15 epochs, average epoch time, and the number of features required by the model architecture.

Results show that Inception outperformed for our dataset with 99% accuracy in the first epoch with the average time 16.3 minutes due to regularization imposed by highest number of layers and smaller convolution filter sizes. However, the computational cost of Inception is lower than second best model Vgg11 which shows 99% accuracy in 45.6 minutes (time taken by eight epochs with average 5.7 minutes per epoch) as compared to one epoch of Inception in 16.3 minutes.

The DenseNet model performed worst due to its architectural structure wherein the model feeds the connection between each layer to its subsequent layer. Since our dataset size is much smaller in comparison redundancy required for subsequent layers. AlexNet outperformed concerning computational cost with 98% accuracy in 6.6 minutes with three epochs.