Abstract:
This study aimed to measure the accuracy rate of Optical Character Recognition (OCR) packages for ensuring the best software for any scanned pages. Studying the overall concepts behind software evaluation, since OCR packages consider as one of the most useful tools for data entry, building archive and publishing e-article.
The researcher tries to find answers for the following important questions: what are the best OCR software for specific printed page quality, which one can recognize tables, charts and other graphical items, which one can recognize colored pages.
The thesis reviews Pattern Recognition (PR), PR application, OCR, OCR recognition methods and applications.
The thesis then provides a description of evaluation method used for evaluating OCR Packages. The thesis concentrated on printed English and Arabic characters. Most of English packages are available for free or free trial version. Therefore, a lot of work was done to improve their accuracy rate. Most English packages included in this study achieves accuracy rate more than 97%. Among the best are Ms Docscan, OmniPage, Cuneiform, Finreader and SimpleOCR were performed well. For many reasons, the number of Arabic OCR packages is very small and most of them are expensive. The thesis evaluates one Arabic Package, which (Readiris8).
All English packages achieved character accuracy rate more than 97% except TextBridge which achieves only 76.47% and DocScanPro 86.83%. MsDocScan is the best package for all document