A novel similar character discrimination method for online handwritten Urdu character recognition in half forms

Document Type : Article


1 Department of Electrical Engineering, Pakistan Institute of Engineering and Applied Sciences, Islamabad, Pakistan

2 Department of Electronic Engineering, Tsinghua University, Beijing, China


Online handwritten Urdu character recognition is one of the key technologies for intelligent interface on smart phones and touch screens. It is a challenging research topic as Urdu script has many similar character groups. A novel similar character discrimination method for online handwritten Urdu character recognition is proposed in this paper which includes pre-classification, feature extraction and fine classification process. The pre-classifier enables the discrimination of similar characters by putting them in distinct smaller subsets according to stroke number and diacritics. Then structural features and wavelet features are extracted. Finally, Support Vector Machines (SVM), Artificial Neural Networks (ANN), and Recurrent Neural Network (RNN) classifiers are compared for fine classification within subsets. Results of RNN classifier without using the proposed pre-classifier and features have also been obtained to check the end-to-end capability of the RNN classifier. Experimental results show that the proposed method is efficient and achieves an overall accuracy of 96% on a large-scale self-collected dataset. It is feasible to extend this method for other Arabic scripts.


Main Subjects

1. Ghods, V. and Sohrabi, M.K. "Online Farsi handwritten character recognition using hidden Markov model", Journal of Computers, 11(2), pp. 169-175 (2016).
2. Mahasukhon, P., Mousavinezhad, H., and Song, J.Y. "Hand-printed English character recognition based on fuzzy theory", 2012 IEEE International Conference on Electro/Information Technology, pp. 1-4, ISSN: 2154- 0357 (2012). DOI:10.1109/EIT.2012.6220772.
3. Khodadad, I., Sid-Ahmed, M., and Abdel-Raheem, E. "Online Arabic/Persian character recognition using neural network classifier and DCT features", IEEE 54th International Midwest Symposium on Circuits and Systems (MWSCAS) (2011).
4. Yao, C. and Cheng, G. "Approximative bayes optimality linear discriminant analysis for Chinese handwriting character recognition", Neurocomputing, 207, pp. 346-353, ISSN: 0925-2312 (2016). DOI:https://doi.org/10.1016/j.neucom.2016.05.017. URL: http://www.sciencedirect.com/science/article/pii/ S0925231216303551.
5. Xiang-Dong, Z., Da-Han, W., Feng, T., et al. "Handwritten Chinese/Japanese text recognition using semi-Markov conditional random fields", IEEE Transactions on Pattern Analysis and Machine Intelligence, 35, pp. 2413-2426 (2013). doi: 10.1109/TPAMI.2013.49.
6. Peng, L., Liu, C., Ding, X., et al. "Multi-font printed mongolian document recognition system", International Journal on Document Analysis and Recognition (IJDAR), 13(2), pp. 93-106, ISSN 1433-2825 (2010). DOI:10.1007/s10032-009-0106-8.
7. Rao Kunte, R.S. and Sudhaker Samuel, R.D. "Online character recognition for handwritten Kannada characters using wavelet features and neural classifier", IETE Journal of Research, 46, pp. 387-393 (2000).
8. Tagougui, N., Kherallah, M., and Alimi, A.M. "Online arabic handwriting recognition: a survey", International Journal on Document Analysis and Recognition (IJDAR), 16(3), pp. 209-226, ISSN1433-2825 (2013). DOI:10.1007/s10032-012-0186-
8URL: http://dx.doi.org/10.1007/s10032-012-0186-8.
9. Harouni, M., Mohamad, D., Shafry, M.M.R., et al. "Handwritten Arabic character recognition based on minimal geometric features", International Journal of Machine Learning and Computing, 2, pp.578-582 (2012).
10. Kherallah, M., Bouri, F., and Alimi, A. "On-line arabic handwriting recognition system based on visual encoding and genetic algorithm", Engineering Applications of Artificial Intelligence, 22(1), pp. 153-170, ISSN 0952-1976 (2009). DOI: https://doi.org/10.1016/j.engappai.2008.05.010 URL: http://www.sciencedirect.com/science/article/pii/ S0952197608001176.
11. Kherallah, M., Haddad, L., Alimi, A.M., et al. "On-line handwritten digit recognition based on trajectory and velocity modeling", Pattern Recognition Letters, 29(5), pp. 580-594, ISSN 0167-8655 (2008). DOI:https://doi.org/10.1016/j.patrec.2007.11.011 URL: http://www.sciencedirect.com/science/article/ pii/S0167865507003662.
12. Mahmoud, S.A. and Mahmoud, A.S. "Arabic character recognition using modified Fourier spectrum (MFS) vs, Fourier descriptors", Cybernetics and Systems: An International Journal, 40(3), pp. 189-210 (2009). DOI:10.1080/01969720802714758.
13. Alimi, A.M. "An evolutionary neuro-fuzzy approach to recognize on-line arabic handwriting", Proceedings of the Fourth International Conference on Document Analysis and Recognition, 1, pp. 382-386 (1997). DOI:10.1109/ICDAR.1997.619875.
14. Hamdani, M., Abed, H.E., Kherallah, M., et al. "Combining multiple hmms using on-line and off-line features for off-line Arabic handwriting recognition", 2009 10th International Conference on Document Analysis and Recognition, pp. 201-205, ISSN 1520-5363 (2009). DOI:10.1109/ICDAR.2009.40.
15. "Telecom indicators", [Online accessed, 14-June-2016], Pakistan Telecommunication Authority (2016). URL: http://www.pta.gov.pk/index.php? Itemid=599.
16. Baloch, F. "Telecom sector: Pakistan to have 40 million smartphones by end of 2016", URL: http://tribune.com.pk/story/953333/telecom-sectorpakistan- to-have-40-million-smartphones-by-end-of- 2016/.
17. Javed, S.T., Hussain, S., Maqbool, A., et al. "Segmentation free Nastalique Urdu OCR", World Academy of Science, Engineering and Technology, International Journal of Computer, Electrical, Automation, Control and Information Engineering, 4, pp. 1514-1519 (2010).
18. Satti, D.A. and Saleem, K. "Complexities and implementation challenges in offline Urdu nastaliq OCR", Conference on Language & Technology 2012 (CLT12), University of Engineering & Technology (UET), Lahore, Pakistan (2012).
19. Naz, S., Hayat, K., Razzak, M.I., et al. "The optical character recognition of Urdu-like cursive scripts", Pattern Recognition, Elsevier, 47, pp. 1229-1248 (2014).
20. Malik, S. and Khan, S.A. "Urdu online handwriting recognition", IEEE International Conference on Emerging Technologies (2005).
21. Shahzad, N., Paulson, B., and Hammond, T. "Urdu Qaeda: recognition system for isolated Urdu characters", IUI Workshop on Sketch Recognition (2009).
22. Haider, I. and Khan, K.U. "Online recognition of single stroke handwritten Urdu characters", IEEE 13th International Multitopic Conference (INMIC2009) (2009).
23. Khan, K.U. and Haider, I. "Online recognition of multi-stroke handwritten Urdu characters", Image Analysis and Signal Processing (IASP) (2010).
24. Shabbir, S. and Siddiqi, I. "Optical character recognition system for Urdu words in Nastaliq font", International Journal of Advanced Computer Science and Applications, 7(5), pp. 567-576 (2016).
25. Hussain, M. and Khan, M.N. "Online Urdu ligature recognition using spatial temporal neural processing", IEEE International Multitopic Conference (INMIC05) (2005).
26. Husain, S.A., Sajjad, A., and Anwar, F. "Online Urdu character recognition system", IAPR Machine Vision Applications (MVA2007) Conference on (2007).
27. Razzak, M.I., Anwar, F., Hussain, S.A., et al. "HMM and fuzzy logic-A hybrid approach for online Urdu script-based languages' character recognition", Knowledge-Based Systems, Elsevier, 23(8), pp. 914- 923 (2010).
28. Razzak, M.I., Hussain, S.A., Abdulrahman, A.M., et al. "Bio-inspired multilayered and multilanguage arabic script character recognition system", International Journal of Innovative Computing Information and Control, 8(4), pp. 2681-2691 (2012).
29. Pal, U. and Sarkar, A. "Recognition of printed Urdu script", 7th International Conference on Document Analysis and Recognition (ICDAR' 03) (2003). DOI:0- 7695-1960-1/03.
30. Ahmad, Z., Orakzai, J.K., Shamsher, I., et al. "Urdu Nastaleeq optical character recognition", Internation Journal of Computer, Information, Systems and Control Engineering, 1, pp. 2374-2377 (2007).
31. Lehal, G.S. "Choice of recognizable unit for Urdu OCR", Workshop on Document Analysis and Recognition (DAR12) (2012). DOI:10.1145/2432553.2432569.
32. Zaman, S., Slany, W., and Saahito, F. "Recognition of segmented Arabic/Urdu characters using pixel values as their features", ICCIT (2012).
33. Javed, S.T. and Hussain, S. "Segmentation based Urdu Nastalique OCR", 18th Iberoamerican Congress (CIARP2013), pp. 41-49 (2013). DOI:10.1007/978-3-642-41827-3 6.
34. Naz, S., Umar, A.I., Bin Ahmed, S., et al. "An OCR system for printed Nasta'liq script: A segmentation based approach", IEEE 17th International, Multi-Topic Conference (INMIC'2014), pp. 255-259 (2014). DOI:10.1109/INMIC. 2014.7097347.
35. Naz, S., Arif, I.U., Ahmad, R., et al. "Offline cursive Urdu-Nastaliq script recognition using multidimensional recurrent neural networks", Neurocomputing, 177, pp. 228-241 (2016).
36. Razzak, M.I., Sher, M., and Hussain, S.A. "Locally baseline detection for online arabic script based languages character recognition", International Journal of the Physical Sciences, 5(7), pp. 955-959 (2010).
37. Razzak, M.I., Hussain, S.A., Khan, M.K., et al. "Handling diacritical marks for online arabic script based languages character recognition using fuzzy cmean clustering and relative position", Information- An International Interdisciplinary Journal, 14(1), pp. 157-165 (2011).
38. Razzak, M.I., Husain, S.A., Mirza, A.A., et al. "Fuzzy based preprocessing using fusion of online and offline trait for online Urdu script based languages character recognition", International Journal of Innovative Computing, Information and Control, 8(5(A)), pp. 3149-3161 (2012).
39. Safdar, Q. and Khan, K.U. "Online Urdu handwritten character recognition: initial half form single stroke characters", 12th International Conference on Frontiers of Information Technology, pp. 292-297 (2014). DOI:10.1109/FIT.2014.61.
40. Patel, D.K., Som, T., Yadav, S.K., et al. "Handwritten character recognition using multiresolution technique and Euclidean distance metric", Journal of Signal and Information Processing, 3, pp. 208-214 (2012).
41. Wei, W., Ming, L., Weina, G., et al. "A new mind of wavelet wransform for handwritten Chinese character recognition", Second International Conference on Instrumentation, Measurement, Computer, Communication and Control (IMCCC) (2012).
42. Aburas, A. and Rehiel, S.M.A. "Off-line omnistyle handwriting Arabic character recognition system based on wavelet compression", Arab Research Institute in Sciences & Engineering ARISER, 3(4), pp. 123-135, ISSN 1994-3253 (2007).
43. Mowlaei, A., Faez, K., and Haghighat, A.T. "Feature extraction with wavelet transform for recognition of isolated handwritten Farsi/Arabic characters and numerals", IEEE 13th Workshop on Neural Networks for Signal Processing, NNSP'03, pp. 547-554, ISSN 1089- 3555 (2003). DOI:10.1109/NNSP.2003.1318054.
44. Jenabzade, M.R., Azmi, R.B.P., and Shirazi, S. "Two methods for recognition of handwritten Farsi characters", International Journal of Image Processing (IJIP), 5(4), pp. 512-520 (2011).
45. Singh, P. and Budhiraja, S. "Handwritten Gurmukhi character recognition using wavelet transform", International Journal of Electronics, Communication & Instrumentation Engineering Research and Development, 2(3), pp. 27-37 (2012).
46. Primekumar, K.P. and Idiculla, S.M. "On-line Malayalam handwritten character recognition using wavelet transform and SFAM", 3rd International Conference on Electronics Computer Technology (ICECT), 1 (2011).
47. Abed, H.E., Margner, V., Kherallah, M., et al. "Icdar 2009 online Arabic handwriting recognition competition", 2009 10th International Conference on Document Analysis and Recognition, pp. 1388-1392, ISSN 1520-5363 (2009). DOI:10.1109/ICDAR.2009.284.
48. Zhang, X.Y., Yin, F., Zhang, Y.M., et al. "Drawing and recognizing Chinese characters with recurrent neural network", IEEE Transactions on Pattern Analysis and Machine Intelligence, 40(4), pp. 849-862, ISSN 0162-8828 (2018). DOI:10.1109/TPAMI.2017.2695539.
49. Graves, A. "Supervised sequence labelling with recurrent neural networks", Ph.D. Thesis, Technical University Munich (2008). URL: http://d-nb.info/99115827X. 
50. Javed, S.T. and Hussain, S. "Improving Nastaliq specific prerecognition process for Urdu OCR", 13th IEEE International Multitopic Conference (2009).
51. Abandah, G.A. and Jamour, F.T. "Recognizing handwritten Arabic script through efficient skeleton-based grapheme segmentation algorithm", 10th International Conference on Intelligent Systems Design and Applications (2010).
52. Wahi, A., Sundaramurthy, S., and Poovizhi, P. "Recognition of handwritten Tamil characters using wavelet", International Journal of Computer Science & Engineering Technology (IJCSET), 5, pp. 335-340 (2014).
53. Jaeger, S., Manke, S., Reichert, J., et al. "Online handwriting recognition: the NPen++ Recognizer", International Journal of Document Analysis and Recognition, IJDAR, 3(3), pp. 169-180 (2001).
54. Al-Hassani, M.D. "Optical character recognition system for multifont English texts using DCT and wavelet transform", International Journal of Computer Engineering and Technology (IJCET), 4(6), pp. 48-61 (2013).
55. Mallat, S., A Wavelet Tour of Signal Processing: The Sparse Way, Academic Press Elsevier Inc., San Dieago (2008).
56. Gonzalez, R.C. and Woods, R.E., Digital Image Processing, 3rd Ed., Prentice-Hall, Inc., Upper Saddle River, NJ, USA, ISBN 013168728X (2006).
57. Amar, C.B., Zaied, M., and Alimi, A. "Beta wavelets. synthesis and application to lossy image compression", Advances in Engineering Software, 36(7), pp. 459-474, ISSN 0965-9978 (2005). URL: http://www.sciencedirect.com/science/article/pii/ S0965997805000116.
58. Murru, N. and Rossini, R. "A Bayesian approach for initialization of weights in backpropagation neural net with application to character recognition", Neurocomputing, 193, pp. 92-105, ISSN0925-2312 (2016). DOI:https://doi.org/10.1016/j.neucom.2016. 01.063 URL: http://www.sciencedirect.com/science/ article/pii/S0925231216001624.
59. Prieto, A., Prieto, B., Ortigosa, E.M., et al. "Neural networks: An overview of early research, current frameworks and new challenges", Neurocomputing, 214, pp. 242-268, ISSN 0925-2312 (2016). DOI: https://doi.org/10.1016/j.neucom.2016.06.014 URL: http://www.sciencedirect.com/science/article/pii/ S0925231216305550.
60. Shamsher, I., Ahmad, Z., Orakzai, J.K., et al. "OCR for printed Urdu script using feed forward neural network", World Academy of Science, Engineering and Technology, 1 (2007).
61. Salameh, W.A. and Otair, M.A. "Online handwritten character recognition using an optical backpropagation neural network", Issues in Informing Science and Information Technology, 2, pp. 787-795 (2005).
62. Theodoridis, S. and Koutroumbas, K., Pattern Recognition, 4th Ed., Academic Press, ISBN 1597492728, 9781597492720 (2008).
63. John, S.T. and Nello, C., Kernel Methods for Pattern Analysis, Cambridge University Press, ISBN 0521813972 (2004).
64. Chang, C.C. and Lin, C.J. "LIBSVM: A library for support vector machines", ACM Transactions on Intelligent Systems and Technology, 2, pp. 1-27 (2011).
65. Lipton, Z.C. "A critical review of recurrent neural networks for sequence learning", CoRR, ArXiv (2015). URL: http://arxiv.org/abs/1506.00019.
66. Graves, A. "Rnnlib: A recurrent neural network library for sequence learning problems", http://source forge.net/projects/rnnl/.
67. Jannoud, I.A. "Automatic Arabic handwritten text recognition system", American Journal of Applied Sciences, 4, pp. 857-864 (2007).
68. Asiri, A. and Khorsheed, M.S., Automatic Processing of Handwritten Arabic Forms Using Neural Networks, World Academy of Science, Engineering and Technology, ISSN 1307-6884 (2005).
69. Broumandnia, A., Shanbehzadeh, J., and Varnoosfaderani, M.R. "Persian/Arabic handwritten word recognition using M-band packet wavelet transform", Image and Vision Computing, 26, pp. 829-842 (2008).