Chapter 7 Conclusions and Future Scope The techniques for the recognition of handwritten Hindi text by segmenting and classifying the characters have been proposed in this thesis work. The problems in handwritten Hindi text written by different persons are identified after carefully analyzing the text. To solve these problems new techniques have been developed for segmentation, feature extraction and recognition. In the present work, four new segmentation algorithms have been proposed. These include text line segmentation, segmentation of half characters, segmentation of lower modifiers and segmentation of touching left modifier from consonant in the middle region of the word. A new technique based on header line and base line detection to segment the overlapped lines of text in handwritten Hindi text have been proposed. Determination of the header line is very tough. The position of the header line in particular line of text and header line in a particular word of the same line may vary. Determining the presence of lower modifier, presence of half character, touching left modifier with consonant in the middle region or to determine the presence of touching characters is very arduous. The new threshold values for their presence in the word have been proposed. For segmentation of half characters from consonants structural properties of the text are considered. For segmentation of lower modifiers, a new technique based on shape of lower modifiers is proposed. A technique based on position and length of the left modifier is proposed for segmentation of left modifier from touching consonant in the middle region of the word. For the validity of the cxli
algorithms, the proposed algorithms are also tested on printed Hindi text and obtained pleasing results. After the segmentation of text, the features are extracted for recognition. A new feature set based on topological features or structural properties of the text has been proposed. A new technique called merging of features for the feature extraction has been proposed in the present work. A particular feature of particular character may depend upon other feature of the character. In such cases, next feature is extracted only if previous feature is available otherwise not. It leads to reduce in number of features to be extracted. Further, the problems in feature extraction are identified and many heuristics are applied to solve those problems. The overall results obtained with proposed algorithms for segmentation and recognition of handwritten Hindi text is very challenging. SVM and Rule based classifiers are used for the classification of characters of handwritten Hindi text. 7.1 Contributions of the Work This thesis has made the following contributions in the field of handwritten Hindi text recognition: i) To the best of researcher s knowledge, this is the first attempt towards the development of OCR for handwritten Hindi text. ii) New techniques are proposed for segmentation of overlapped line of handwritten Hindi text, segmentation of conjuncts (half characters), segmentation of lower modifiers and segmentation of touching modifiers or consonants in the middle region. cxlii
iii) A new feature set has been proposed which contains slant and size invariant features. The topological features are extracted which are very robust. iv) A new technique is proposed for feature extraction to increase the speed of recognition. All the features are not extracted from each character, only main features and unique feature of that character are extracted. v) A new technique is proposed for word recognition. vi) Rule based classifier and SVM classifier is used for character recognition. 7.2 Future Scope The proposed algorithms used for segmentation of handwritten Hindi text can be extended further for recognition of other Indian scripts. The proposed algorithms of segmentation can be modified further to improve accuracy of segmentation. New features can be added to improve the accuracy of recognition. These algorithms can be tried on large database of handwritten Hindi text. There is a need to develop the standard database for recognition of handwritten Hindi text. The proposed work can be extended to work on degraded text or broken characters. Recognition of digits in the text, half characters and compound characters can be done to improve the word recognition rate. cxliii
Bibliography Abuhaiba, I.S.I.; Datta S.; and Holt, M.J.J. (1995), ''Line Extraction and Stroke Ordering of Text Pages'', Proceedings of the 3 rd International Conference on Document Analysis and Recognition, pp. 390-393. Amin, A.(2000), Recognition of printed Arabic text based on global features and decision tree learning techniques, Pattern Recognition, Vol. 33, pp. 1309-1323. Arica, N. and Vural, F. T. Y. (2001), An overview of character recognition focused on offline handwriting, IEEE Transactions on Systems, Man, and Cybernetics Part C: Applications and Reviews, Vol. 31(2), pp. 216-233. Arora, S.; Bhattacharjee, D.; Nasipuri, M.; Basu, D. K.; and Kundu, M. (2009), Application of Statistical features in Handwritten Devanagari Character Recognition, International Journal of Recent Trends in Engg., Vol. 2(2), pp. 40 42. Arora, S.; Bhaattacharjee, D.; Nasipuri, M.; Malik, L.; Kundu, M.; and Basu, D. K.(2010), Performance Comparison of SVM and ANN for Handwritten Devanagari Character Recognition, International Journal of Computer Science Issues, Vol. 7, Issue 3 (6), pp. 18-26. Arora, S.; Bhattacharjee, D.; Nasipuri, M.; Basu, D. K.; Kundu, M.; and Malik, L. (2009), Study of different features on handwritten Devnagari characters, Proceedings of the international conference on Emerging Trends Engg. Technol., pp. 929 933. Bag, S.; Harit, G. (2013), A survey on optical character recognition for Bangla and Devanagari scripts, Sadhana, Vol. 38, pp. 133-168. cxliv
Bansal, V. (1999), Integrating Knowledge Sources in Devanagari Text Recognition, Ph. D. thesis, IIT Kanpur, India. Bansal, V. and Sinha, R.M.K. (2000), Integrating knowledge sources in Devanagari text recognition, IEEE Transactions- System Man Cybernetics. A: Syst. Hum., Vol. 30 (4), pp. 500 505. Bansal, V. and Sinha, R. M. K. (2002), Segmentation of touching and fused Devanagari characters, Pattern Recognition, Vol. 35(4), pp. 875-893. Biswas, K. K. and Chatterjee, S. (1995), Feature based recognition of Hindi characters, Proceedings of Indian Conference on Pattern Recognition, Image Processing and Computer Vision, pp. 182-187. Bortolozzi, F.; Britto, A.; Oliveria, L. S.; and Morita, M. (2005), Recent advances in handwriting recognition, Proceedings of International Workshop on Document Analysis (IWDA), pp. 1-30. Casey, R. G.; and Lecolinet E. (1996), A survey of methods and strategies in character segmentation, IEEE Transactions on PAMI, Vol. 18(7), pp. 690-706. Chaudhuri, B. B.; Pal, U. and Mitra, M. (2001), Automatic recognition of printed Oriya script, Proceedings of 6 th International Conference, ICDAR, pp. 795-799. Deshpande, P. S.; Malik, L.; and Arora, S. (2008), Fine classification & recognition of hand written Devnagari characters with regular expressions & minimum edit distance method, Journal of Computers, Vol. 3(5), pp. 11 17. Garain, U. and Chaudhuri, B. B. (1998), On recognition of touching characters in printed Bangla documents, Proceedings of Indian Conference on Computer Vision, Graphics, and Image Processing, pp. 377-380. cxlv
Garain, U. and Chaudhuri, B. B. (2002), Segmentation of touching characters in printed Devanagari and Bangla scripts using fuzzy multifactorial analysis, IEEE Transactions on Systems, Man and Cybernetics, Part C, Vol. 32(4), pp. 449 459. Glauberman, M. H. (1956), Character recognition for business machines, Electronics, Vol. 29, pp. 132-136. Impedovo, S.; Ottaviano L.; and Occhinegro, S. (1991), Optical character recognition- a survey, International Journal Pattern Recognition and Artificial Intelligence, Vol. 5(1-2), pp. 1-24. Hanmandlu, M. and Murthy, O. V. R. (2007), Fuzzy model based recognition of handwritten numerals, Pattern Recognition, Vol. 40, pp. 1840 1854. Hanmandlu, M.; Murthy O. V. R.; and Madasu, V. K. (2007), Fuzzy Model based recognition of handwritten Hindi characters, Proceedings of Digital Image Computing Techniques and Applications, pp: 454-461. Heutte, L.; Paquet, T.; Moreau, J. V.; Lecourtier, Y.; and Olivier, C. (1998), A structural/statistical feature based vector for handwritten character recognition, Pattern Recognition Letters, Vol. 19(7), pp. 629-641. Holambe, A. K.; Thool, R. C. (2010), Comparative Study of Different Classifiers for Devanagari Handwritten Character Recognition, International journal of Science and Technology, Vol. 2(7), pp. 2681 2689. Hu, M. K. (1962), Visual pattern recognition by moment invariants, IRE Transactions on Information Theory, Vol. 8(2), pp. 179-187. Hussain, A. B. S.; Toussaint G. T.; and Donaldson, R. W. (1972), Results obtained using a simple character recognition procedure on Munson s handprinted data, IEEE Transactions on Computers, pp. 201-205. cxlvi
Jangid, M. (2011), Devanagari Isolated Character Recognition by Using Statistical Features, International Journal of Computer Science and Engg., Vol. 3(6), pp. 2400 2407. Jayadevan, R.; Kolhe S. R.;Patil P. M.; Pal U. (2011), Offline recognition of Devanagari script: A survey, IEEE Transactions on Systems, Man, and Cybernetics, Part C: Applications and Reviews, Vol. 41(6), pp. 782-796. Jindal, M. K.; Sharma, R. K.; and Lehal, G. S. (2007), Segmentation of Horizontally Overlapping Lines in Printed Indian Scripts, International Journal of Computational Intelligence Research (IJCIR), Vol. 3 (4), pp. 277-286. Jindal, M. K.; Lehal G. S.; and Sharma, R. K. (2009), On Segmentation of touching characters and overlapping lines in degraded printed Gurmukhi script, International Journal of Image and Graphics (IJIG), World Scientific Publishing Company, Vol. 9 (3), pp. 321-353. Jindal, M. K.; Sharma R. K.; and Lehal, G. S. (2009), Segmentation of Touching Characters in Upper Zone in printed Gurmukhi Script, Proceedings of the 2 nd Bangalore Annual Compute Conference, Banglore, ACM, No. 9. Kahan, S.; Pavlidis, T.; and Baird, H. S. (1987), On the recognition of printed characters of any font and size, IEEE Transactions on PAMI, Vol. 9(2), pp. 274-288. Kumar A.; Holambe, N.; Thool, R. C.; and Jagade, S. M. (2010), Printed and Handwritten Character and Number Recognition of Devanagari Script using Gradient Features, International Journal of Computer Applications, Vol. 2 (9), pp. 38 41. cxlvii
Lee L. L. and Gomes, N. R. (1997), Disconnected handwritten numeral image recognition, Proceedings of the 4 th International Conference, ICDAR, pp. 467-470. Leedham, G. and Pervouchine, V. (2005), Validating the use of handwriting as a biometric and its forensic analysis, Proceedings of International Workshop on Document Analysis (IWDA), India, pp. 175-192. Lehal G. S. and Singh, C. (2000), A Gurumukhi Script recognition system, Proceeding of 15 th International conference on Pattern recognition, Spain, Vol. 2, pp. 557-560. Lehal, G. S. and Singh, C. (2001), A technique for segmentation of Gurmukhi text, Computer Analysis of Images and Patterns, Proceedings CAIP, W. Skarbek (Ed.), Lecture Notes in Computer Science, Vol. 2124, Springer-Verlag, Germany, pp. 191-200. Li, Y.; Zheng, Y.; Doermann, D.; and Jaeger, S. (2006), A new algorithm for detecting text line in handwritten documents, Proceedings of the Tenth International Workshop on Frontiers in Handwriting Recognition, pp. 35 40. LIBSVM-A Library for Support Vector Machines, available online: http://www.csie.ntu.edu.tw/~cjlin/libsvm. Likforman-Sulem, L.; and Faure, C. (1994), "Extracting text lines in handwritten documents by perceptual grouping", Advances in handwriting and drawing: a multidisciplinary approach, pp. 21-38. Lu, Y.; and Shridhar, M. (1996), Character segmentation in handwritten words an overview, Pattern Recognition, Vol. 29(1), pp. 77-96. Louloudis, G.; Gatos, B.; Pratikakis, I.; and Halatsis, K. (2006), A Block Based Hough Transform Mapping for Text Line Detection in Handwritten cxlviii
Documents, Proceedings of the Tenth International Workshop on Frontiers in Handwriting Recognition, pp. 515-520. Mori, S.; Suen, C. Y.; and Yamamoto, K. (1992), Historical review of OCR research and development, Proceedings of the IEEE, Vol. 80(7), pp. 1029-1058. Mukherji, P. and Rege, P. (2009), Shape Feature and Fuzzy Logic Based Offline Devanagari Handwritten Optical Character Recognition, Journal of Pattern Recognition Research, Vol. 4, pp. 52-68. Pal, U. and Chaudhuri, B. B. (1997), Printed Devanagari Script OCR System, Vivek, Vol. 10, pp. 12-24. Pal, U. and Chaudhuri, B. B. (1999), Automatic separation of machine-printed and handwritten text lines, Proceedings of the 5 th International Conference, ICDAR, pp. 645-648. Pal, U. and Chaudhuri, B. B. (2004), Indian script character recognition: a survey, Pattern Recognition, Vol. 37(9), pp. 1887-1899. Pal, U. and Datta, S. (2003), Segmentation of Bangla Unconstrained Handwritten Text, Proceedings of the 7 th International Conference, ICDAR, pp. 1128-1132. Pal, U.; Wakabayashi, T.; Kimura, F. (2009), Comparative Study of Devanagari Handwritten Character Recognition Using Different Features and Classifiers, Proceedings of the 10 th International Conference, ICDAR, pp. 1111-1115. Pal, U.; Sharma, N.; Wakabayashi, T.; and Kimura, F. (2007), Off-line handwritten character recognition of Devnagari script, Proceedings of the 9 th International Conference, ICDAR, pp. 496 500. Pal, U.; Wakabayashi, T.; Sharma, N.; and Kimura, F. (2007), Handwritten numeral recognition of six popular Indian scripts, Proceedings of the 5 th International Conference, ICDAR, pp. 749 753. cxlix
Palit, S. and Chaudhuri, B. B. (1995), A feature-based scheme for the machine recognition of printed Devanagari script, Proceedings of Indian Conference on Pattern Recognition, Image Processing and Computer Vision, pp. 163-168. Ramteke, R. J. and Mehrotra, S. C. (2008), Recogntion of Handwritten Devanagari Numerals, International Journal of Computer Processing of Object Oriental Languages. Reddi, S. S. (1981), Radial and angular moment invariants for image identification, IEEE Transactions on PAMI, Vol. 3(2), pp. 240-242. Sethi, I. K. and Chatterjee, B. (1977), Machine recognition of constrained hand Printed Devanagari, Pattern Recognition, Vol. 9(2), pp. 69-76. Sharma, N.; U.pal, U.; Kimura F. and Pal, S. (2006), Recognition of Off-line Handwritten Devanagari Characters using Quadratic Classifier, ICVGIP, pp.805 816. Shaw, B.; Parui, S. K.; and Shridhar, M. (2008), Off-line handwritten Devanagari word recognition: A holistic approach based on directional chain code feature and HMM, Proceeding of the IEEE International conference on Information Technology, pp. 203 208. Shaw, B.; Parui, S. K.; and Shridhar, M. (2008), A segmentation based approach to offline handwritten Devanagari word recognition, Proceeding of the IEEE International conference on Information Technology, pp. 256 257. Shridhar, M. and Badreldin, A. (1984), High accuracy character recognition using Fourier and topological descriptors, Pattern Recognition, Vol. 17(5), pp. 515-524. cl
Srinivas, B. A.; Agarwal, A.; and Rao, C. R. (2008), An overview of OCR research in Indian Scripts, International Journal of Computer Sciences and Engineering Systems, pp.141-153. Tarling, R. and Rohwer, R. (1993), Efficient use of training data in the n-tuple recognition method, Electronics Letters, Vol. 29(24), pp. 2093-2094. Teh C. H. and Chin, R. T. (1988), On image analysis by the method of moments, IEEE Transactions on PAMI, Vol. 10(4), pp. 496-513. Trier, O. D.; Jain A. K. ; and Taxt, T. (1996), Feature extraction methods for Character recognition: - a survey, Pattern Recognition, Vol. 29(4), pp. 641-662. Tripathy, N.; and Pal, U. (2004), Handwriting Segmentation of unconstrained Oriya Text, International Workshop on Frontiers in Handwriting Recognition, pp. 306 311. Wakabayashi, T.; Pal, U.; Kimura, F.; and Miyake, Y. (2009), F-ratio based weighted feature extraction for similar shape character recognition, Proceedings of the 10 th International Conference, ICDAR, pp. 196 200. Weliwitage, C.; Harvey A. L.; and Jennings, A. B. (2005), Handwritten Document Offline Text Line Segmentation, Proceedings of Digital Imaging Computing: Techniques and Applications, pp. 184-187. Zahour, A.; Taconet, B.; Mercy, P.; and Ramdane, S. (2001), Arabic Hand-written Text-line Extraction, Proceedings of the Sixth International. Conference on Document Analysis and Recognition, ICDAR, pp. 281 285. Zahour, A.; Taconet, B.; Likforman-Sulem L.; and Boussellaa, W. (2008), Overlapping and multi-touching text line segmentation by Block Covering analysis, Pattern Analysis and Applications, Vol. 12, pp. 335-351. cli