Abstract:
In this thesis a two phase method for Isolated Arabic Handwritten Character Recognition (IAHCR) system has been presented. The objective of the proposed method is to achieve the best possible recognition accuracy. The new method combines two stages based on two classifiers, a public and a private, according to the similar features among characters. In the first stage, a public classifier to deal with all character groups has been built, where each group contains characters with overlapped features. The public classifier classifies the characters in the data set to a specified group. In the second stage, a private classifier for each group to recognize and classify the characters within a group has been created. To determine the similarities between the Arabic characters, a neural network algorithm using back propagation algorithm with two experiments with different data set sizes has been applied. The characters with the same structure are then grouped in one class and a new size of data set which consists of fifteen groups (classes) is formed for all the Arabic characters.
The Sudan University for Sciences and Technology Arabic Recognition Group (SUST-ARG) data set was used to classify the Arabic handwritten character groups.
In this thesis, two types of statistical features have been employed. The first type of features, called CCOB, which includes the center mass (xt, yt) of the character image, count the crosshair, the outliers (Right, Left, Top and Down) and the black ink histograms. The second set of features is based on extract eigenvectors and eigenvalues of the shifted mean of resize cropped images. The two set of features were then combined and reduced to four features using the principal component analysis PCA technique.
The Adaptive Neural network Fuzzy Inference System (ANFIS) classifier was used at all levels of the character recognition stages with different learning algorithms. For the first level, a general classifier for 34 classes was used. For the second level, a group classifier for 15 groups was used and a character classifier for separated character was also used.
Experimental results based on data set of 6800 images using Arabic Handwritten characters have proved the efficiency of the new proposed recognition system.
Different experiments have been conducted with different set of features for two stages of classification and obtained the highest recognition rate. In the first stage, the recognition rates were 96.1, 96.2 and 97.15 for the first set of features, second set of features and combined set of features, respectively. In the second stage, the recognition rates were 99.30, 99.46 and 99.34 for the first set of features, second set of features and combined set of features, respectively.
The system has achieved the highest recognition rate of 99.46% for the tested data set using the proposed two stage recognition system.
The learning process for a large training data set needs more time and requires large memory.