|
| Optical Character Recognition System for Urdu (Naskh Font)Using Pattern Matching Technique
|
|
Full
text: |
PDF(460.5KB) |
|
|
Source |
International Journal of Image Processing (IJIP) |
|
Table of Contents |
|
|
Download
Complete Issue PDF(1.6MB) |
|
Volume: 3 Issue: 3 |
| |
Pages: |
|
Publication
Date: June 2009 |
|
ISSN
(Online): 1985-2304 |
|
|
|
|
|
Pages |
92 - 104 |
|
Author(s) |
|
|
|
Published
Date |
01-09-2009 |
|
Publisher |
CSC
Journals, Kuala Lumpur,
Malaysia |
|
ADDITIONAL
INFORMATION |
| Keywords Abstract References Cited by Related Articles Collaborative
Colleague |
| |
|
| |
KEYWORDS: Pattern matching , chain code creation , morphology , segmentation , training system , recognition system |
|
|
| |
|
|
| This Manuscript is indexed in the following databases/websites:- |
|
| 1. Directory of Open Access Journals (DOAJ) |
| 2. ScientificCommons |
| 3. OpenJ-Gate |
| 4. Scribd |
| 5. PDFCAST |
| 6. Docstoc |
| 7. CiteSeerX |
| 8. WorldCat |
| 9. Google Scholar |
| 10. Academic Index |
| 11. refSeek |
| 12. ResearchGATE |
| 13. Bielefeld Academic Search Engine (BASE) |
| 14. Microsoft Academic Search |
| 15. iSEEK |
| 16. Socol@r |
| |
|
| |
|
|
| The offline optical character recognition (OCR) for different languages has been developed over the recent years. Since 1965, the US postal service has been using this system for automating their services. The range of the applications under this area is increasing day by day, due to its utility in almost major areas of government as well as private sector. This technique has been very useful in making paper free environment in many major organizations as far as the backup of their previous file record is concerned. Our this system has been proposed for the Offline Character Recognition for Isolated Characters of Urdu language, as Urdu language forms words by combining Isolated Characters. Urdu is a cursive language, having connected characters making words. The major area of utility for Urdu OCR will be digitizing of a lot of literature related material already stocked in libraries. Urdu language is famous and spoken in more than 3 big countries including Pakistan, India and Bangladesh. A lot of work has been done in Urdu poetry and literature up to the recent century. Creation of OCR for Urdu language will make an important role in converting all those work from physical libraries to electronic libraries. Most of the stuff already placed on internet is in the form of images having text, which took a lot of space to transfer and even read online. So the need of an Urdu OCR is a must. The system is of training system type. It consists of the image preprocessing, line and character segmentation, creation of xml file for training purpose. While Recognition system includes taking xml file, the image to be recognized, segment it and creation of chain codes for character images and matching with already stored in xml file. The system has been implemented and it has 89% recognition accuracy with a 15 char/sec recognition rate. |
| |
|
| |
|
| |
| 1 |
Afzal, M. and Hussain, S., “Urdu Computing Standards: Urdu Zabta Takhti (UZT) 1.01”, in the Proceedings of International IEEE Multi topic Conference (INMIC), Lahore University of Management Sciences (LUMS), Lahore, Pakistan, 2001. |
|
|
| 2 |
Ethnologue, Languages of Pakistan, http://www.ethnologue.com/show_country.asp?name=Pakistan |
|
|
| 3 |
See the Unicode Consortium website at http://unicode.org |
|
|
| 4 |
Bhurgari, A. M. 2007. Enabling Pakistani Languages through Unicode, published at http://download.microsoft.com/download/1/4/2/142aef9f-1a74-4a24-b1f4- 782d48d41a6d/PakLang.pdf |
|
|
| 5 |
Thresholding, Image Segmentation, Digital Image Processing 2/e Rafael C. Gonzalez, Richard E. Woods. |
|
|
| 6 |
Fast, Bruce B., Allen, Dana R. OCR image preprocessing method for image enhancement of scanned documents. |
|
|
| 7 |
Zaheer Ahmad, Jehanzeb Khan Orakzai, Inam Shamsher, and Awais Adnan. “Urdu Nastaleeq Optical Character Recognition”, “Proceedings of world academy of science, engineering and technology volume 26 december 2007”. |
|
|
| 8 |
U. Pal and Anirban Sarkar, “Recognition of Printed Urdu Script”, “Proceedings of the Seventh International Conference on Document Analysis and Recognition (ICDAR 2003)”. |
|
|
| 9 |
Khalid Saeed, “New Approaches for Cursive Languages Recognition: Machine and Hand Written Script and Texts”. |
|
|
| 10 |
T. Sari and M. Sellami, “Cursive Arabic Script Segmentation and Recognition System”. |
|
|
| 11 |
Bozinovic, R.M.; Srihari, S.N, “Off-line cursive script word recognition”. |
|
|
| 12 |
Soille, P. [2003]. “Morphological Image Analysis: Principles and Applications”, 2nd ed., Springer-Verlag, NY. |
|
|
| 13 |
Dougherty. E. R. and Lotufo, R. A. [2003]. “Hands-on Morphological Image Processing”, SPIE--The International Society for Optical Engineering, Bellingham, WA. |
|
|
| 14 |
International Journal of Pattern Recognition and Artificial Intelligence. |
|
|
| 15 |
Alasdari McAndrew, Anne Venables, “A ‘Secondary’ Look at Digital Image Processing”. |
|
|
| 16 |
Ganapathy, V., Lean, C.C.H., “Optical Character Recognition Program for Images of Printed Text using a Neural Network”. |
|
|
| 17 |
Nabeel Shahzad, Brandon Paulson, Tracy Hammond, “Urdu Qaeda: Recognition System for Isolated Urdu Characters”. |
|
|
| 18 |
Hermilo, Ernesto, Ramon M. “Efficiency of chain codes to represent binary objects”. |
|
|
| 19 |
Yong Kui Liua and Borut Žalik, “An efficient chain code with Huffman coding”. |
|
|
| 20 |
Shah, Z.A., “Ligature based optical character recognition of Urdu- Nastaleeq font”. |
|
|
| 21 |
Inam Shamsher, Zaheer Ahmad, Jahenzeb Khan Orakzai and Awais Adnan, “OCR For Printed Urdu Script Using Feed Forward Neural Network”. |
|
|
| 22 |
“The Origin of Urdu Language” http://www.essortment.com/all/urdulanguage_rguo.htm |
|
|
| 23 |
T.S El-Sheikh and R.M Guindi, “computer Recognition of Arabic Cursive Script,” Pattern Recognition, Vol.21, No, 4, 1988, pp.293-302. |
|
|
| 24 |
G. Nagy Rensselaer Polytechnic Institute Troy, New York, “Chinese Character Recognition A Twenty Five Year Retrospective”. Tsuyoshi Kitani t, riguchi and Masami Ilara Yoshio, “Pattern Matching in the Textract Information Extraction System”. |
|
|
| |
|
| |
|
| |
| |
|
| |
|
| |
| 1 |
Faculty of Telecommunication & Information Engineering - University of Engineering and Technology (UET)Taxila |
| |
|
| |
|
| |
|
| Tabassam Nawaz : Colleagues
|
|
| Syed Ammar Hassan Shah Naqvi : Colleagues
|
|
| Habib ur Rehman : Colleagues
|
|
| Anoshia Faiz : Colleagues
|
|