Abstract:
The researches in Optical Character Recognition (OCR) area by using Hidden Markov Models (HMMs) are continuing until this moment.
The work presented in this thesis is proposed for recognition of offline isolated Arabic handwritten characters using HMMs as classifier and Freeman chain code as feature extraction method.
we use scheme to recognize the bodies of the characters only since many characters might share the same body. Characters with similar bodies are grouped as one class, then each class represented by one-character body.
This scheme decreases the number of the characters in dataset from 34 characters to 18 characters.
The dataset used in this thesis is isolated Handwritten Arabic Characters (IHAC) dataset, which collected by Arabic Language Technology Research Group at Sudan University of Science and Technology.
Most systems attempt to segment characters into sub-characters however, segmenting handwritten characters is very difficult. So, to avoid this, characters are treated without segmentation.
Moreover, this work is divided into three main phases to provide a recognition system. The first phase is the preprocessing, which applies efficient preprocessing methods which are essential for optical character recognition. In this phase, methods for normalization and digitization are implemented. Then dots are removed from some characters, then characters' bodies are thinned.
The second phase is feature extraction. This phase makes use of the thinned images to extract features that are essential in recognizing the images. Features are extracted by implementing Freeman chain code (FCC) method, then it normalized to 10 digits for each sample.
The third and final phase is the classification of characters' bodies by Hidden Markov Models (HMMs) classifier. 25179 samples from SUST/ALT dataset are used for training (70%), and testing (30%). Several experimental were examined and a best recognition performance of 59.39% is achieved for testing dataset and 80.86% for training dataset. The results were acceptable.
One of the important finding of these set of experiment is the high confusion between some classes, this due to the variation of writers' styles which case similarity between characters. Moreover, error may occur due to inadequate capability with the features used.
The proposed system has been implemented and tested on MATLAB R2010b environment.