Search or add a thesis

Advanced Search (Beta)
Home > A Technique for the Design and Implementation of an Ocr for Printed Nastalique Text

A Technique for the Design and Implementation of an Ocr for Printed Nastalique Text

Thesis Info

Access Option

External Link

Author

Sattar, Sohail Abdul

Program

PhD

Institute

NED University of Engineering & Technology

City

Karachi

Province

Sindh

Country

Pakistan

Thesis Completing Year

2009

Thesis Completion Status

Completed

Subject

Computer Science

Language

English

Link

http://prr.hec.gov.pk/jspui/handle/123456789/620

Added

2021-02-17 19:49:13

Modified

2024-03-24 20:25:49

ARI ID

1676727691124

Similar


This thesis presents a novel segmentation free technique for the design and implementation of an OCR (Optical Character Recognition) system for printed Nastalique text. Specific area of this thesis is document understanding and recognition which is a branch of computer vision and in turn a sub-class of Artificial Intelligence. Optical character recognition is the translation of optically scanned bitmaps of printed or hand written text into digitally editable data files. OCRs developed for many world languages are already under efficient use but none exist for Nastalique – a calligraphic adaptation of the Arabic script, just as Jawi is for Malay. More often, a single script with its basic character shapes is adapted for writing in multiple languages e.g. the Roman script for English, German and French, and the Arabic script for Persian, Sindhi, Urdu, Pashtu and Malay. Urdu has 39 characters against the Arabic 28. Each character then has two to four different shapes according to their position in the word: isolated, initial, medial and final. Many character shapes have multiple instances and are context sensitive – character shapes changing with changes in the antecedent or the precedent character. At times even the third or the fourth character may cause a similar change depicting an n-gram model in a Markov chain. Unlike the Roman script, word and character overlapping in Nastalique, makes optical recognition extremely complex. Compared to Roman script languages’ OCRs very little research work is done on Arabic Naskh OCR. Only a few Arabic Naskh OCR systems are available today and they too are far from perfect, lagging behind in accuracy as compared to Roman script OCR systems. In this perspective Nastalique is even more complicated than Naskh as it has multiple base lines, more overlapping of characters within a ligature and between adjacent ligatures, vertical stacking of characters in a ligature etc. Urdu has still not attracted researchers’ attention for the development of OCR partly due to lack of funds in this area but mainly due to the challenges the Nastalique style offers because of its cursiveness and context-sensitivity. For the same reason published research work in this area is nearly non-existent. The proposed system for Nastalique OCR does not require segmentation of a ligature into constituent character shapes. However, it does require segmentation at two levels i.e. first the text image is segmented into lines of text then each of the lines of text is further segmented into ligatures or isolated characters. The next step is a line by line cross-correlation for recognition of characters in the ligatures whereby, character codes are written into a text file in the sequence the characters are found in the ligature. As the recognition process is completed, the character codes in the text file are given to the rendering engine, which displays the recognized text in a text region. The limitation of the proposed Nastalique character recognition system is that it is font dependent: it needs the same font file for recognition which was used to write the text in. The new undertaking has greater challenges as it will aim to overcome the inherent cursiveness and context sensitivity of Nastalique style of writing. For Nastalique OCR, we develop character-based True Type Font files for a few Nastalique words. These words are written using the same character-based TTF font and an image is made of the Nastalique text. The image is then given to our Nastalique OCR. After recognition the rendering is done by using the same TTF font file to display the recognized text. The work is therefore three folds; development of character-based Nastalique True Type Font, Nastalique character recognition and rendering the recognized text using character-based Nastalique True Type Font. Since our character-based segmentation-free Nastalique OCR algorithm needs, as a ground work, a character-based Nastalique Text Processor, we have also proposed a Finite State Nastalique Text Processor Model. Implementation is not yet done so results are not reported. However this model could serve as an impetus for future research in this challenging field. Optical Character Recognition for Roman script languages is almost a solved problem for document images and researchers are now focusing on extraction and recognition of text from video scenes. This new and emerging field in character recognition is called Video OCR and has numerous applications like video annotation, indexing, retrieval, search, digital libraries, and lecture video indexing. The emerging field for character recognition is attracting research on other scripts like Chinese, but to the best of our knowledge, no work is reported as yet, on Video OCR for Arabic script languages like Arabic, Persian and Urdu. As an extension of our Nastalique OCR to Video OCR for Arabic script languages, we have also performed experiments on video text identification, localization and extraction for its recognition. We have used MACH (Maximum Average Correlation Height) filter to identify text regions in video frames, these text regions are then localized and extracted for recognition. All research and development work is done using Matlab 7.0. Experiments and results are reported in the thesis.
Loading...
Loading...

Similar Books

Loading...

Similar Chapters

Loading...

Similar News

Loading...

Similar Articles

Loading...

Similar Article Headings

Loading...

شفیق الرحمن قدوائی

شفیق الرحمن قدوائی
شفیق الرحمن مرحوم اگر چہ شہرت و ناموری کے عام معیار سے کو ئی بڑے آدمی نہ تھے مگر اپنے ایثار و قربانی، اخلاق وکردار، اخلاص و عمل اور خاموش اور بے لوث خدمات کے لحاظ سے بہت سے بڑے بڑے لیڈروں پر فائق تھے، جامعہ ملیہ کے لئے تو انھوں نے اپنی زندگی وقف کردی تھی اور سرد و گرم دور میں بھی اس سے جدا نہ ہو ئے، اور یہ کہنا غلط نہ ہوگا کہ جامعہ انہی کی محنت و جانفشانی کی بدولت زندہ رہ گیا، ظاہر وباطن دونوں میں مسلمان اور اپنے اوصاف کی بنا پر ہر جماعت میں مقبول تھے، کا نگریس اور حکومت دونوں کے سنجیدہ طبقہ میں ان کا بڑا وقار و وزن اور اخلاقی اثر تھا، مگر وہ اتنے بے لوث تھے کہ کبھی اس اثر سے فائدہ اٹھانے کی کوشش نہیں کی، ان کو بنیادی تعلیم کا عملی تجربہ تھا، اس کے وہ ماہر تھے، اس لئے یو این او کی جانب سے اس کام کے لئے انڈونیشیا بھیجے گئے تھے، ابھی وہ وہیں تھے کہ گذشتہ الیکشن میں کانگریس نے ان کو دہلی اسمبلی کے لئے مقرر ہو ئے، مگر اس سے بھی ان کا فائدہ اٹھانے کا موقع نہ مل سکا، تھوڑے ہی دنوں کے بعد بیمار پڑگئے، اور چند مہینے بیمار رہ کر ۳؍ اپریل کو انتقال کیا، انتقال کے وقت کل ۵۳ سال کی عمر تھی جو سیاست کی دنیا میں عین شباب کی عمرہے، مسلمانوں میں اب ایسے مخلص اور باعمل آدمی مشکل سے پیدا ہوں گے، اﷲ تعالیٰ اس پیکر اخلاص کو اپنی رحمت و مغفرت سے سرفراز فرمائے۔ (شاہ معین الدین ندوی، اپریل ۱۹۵۳ء)

 

ڈاكٹر فضل الرحمن (م۱۹۸۸ء) کے آرا ء كا ایجابی اور سلبى پہلو

The government of General Ayub Khan (Former President of Pakistan) established an Institution in 1960, in the name of Idarah Tahqeeqat Islami (Islamic Research Institute). Dr, Fazlur Rahman, was the visiting professor at the institute remained on the rank of director of the year 1961 to 768 in seven years. And later, he serves as an advisor to the Islamic Ideology council. The writer who was published by the Institute of Islamic Research was the first editor of "Fikr-o - Nazar”. The scholars were considered as 'expertise of logic and philosophy' as 'interpretation of the Qur'an'. It is mentioned in the various verses of the Prophet (peace and blessings of Allah be upon him). The slaughtering of zakat in 'zakat' animal slaughtering 'basic laws and family planning' matters of marriage and Sunnah, such as the month of revelation, and their opinions have earned great reputation. And because of which they were accused of denying the heavenly nature of the Qur'an. Therefore, the first step towards Islamic thinking regarding the Islamic idea was to put an eye on Islamic law and religious beliefs on Islam. According to their plan, the difference between the Quranic verses and the verses and the laws of the law, is the difference. Regarding the meanings, his axis received: The beginning of the tradition and the meaning of 'the law of the law' is the word and the law. Islamic Laws' Principles Concernedly speaking about issues like Fiqh and Qa'as and al-Azai speak.

Exploring Experiences of Key Stakeholders of the New Bed Hons Elementary Programme in a Public Sector University and a Government Elementary College of Education in Karachi, Pakistan

This study explored the key stakeholders' experiences of the BEd (Hons) elementary programme in two teacher education institutions (TEIs)-a teacher education department at a public sector university and a government elementary college of education, in Karachi, Pakistan. Within qualitative research tradition, phenomenological design was employed to explore key stakeholders' perceptions and experiences about selected aspects (i.e. curriculum, teaching, assessment and practicum). Twenty-two participants including two heads of departments, four teacher educators, four cooperating teachers, four graduate students and eight graduating students, were selected purposefully, thereby, ensuring maximum variation sampling. The data were collected through semi-structured interviews, focus group discussions and document analysis. The findings revealed that the programme is appreciated mainly in relation to its new features (e.g. revised curriculum and practicum). Factors pertaining to stakeholders' involvement and programmatic changes facilitated programme delivery. Whereas, factors in relation to resources, conduction of practicum, teachers' capacity, alignment of the programme with the overall education system, coordination among the related stakeholders, specialisation courses, background of the prospective teachers, and accountability, hindered stakeholders' efforts towards programme delivery. However, initiatives at personal, group and TEIs levels (e.g. utilizing personal resources and revising practicum), were reported to be taken to overcome the challenges. The study also found that the stakeholders at the university are more privileged in that they enjoy more autonomy in decision-making with regards to programme delivery as compared to their college counterparts, which makes the university more enabling for the programme as compared to the college. Overall, stakeholders see less hope and opportunities than fears and uncertainties, pertaining to the future of the BEd (Hons) programme. The findings of the study suggest that the programme could be delivered more effectively if the required material resources were provided to the TEIs, capacity building of the teacher educators and the cooperating teachers were ensured and coordination and alignment were ensured amongst various stakeholders at the level of policies and practices. Moreover, the regulatory bodies (e.g. The National Accreditation Council for Teacher Education (NACTE)) have to play an active role to constantly review and monitor the programme for ensuring its relevance and effectiveness. For future studies, it is suggested that a country wide analysis of the current scenario of teacher education and TEIs be carried out for improving the missing links in the system.