Designing Broken Plurals Processing Method for Enhancing the Performance of Arabic Information Retrieval Systems

Mahmoud, MohamedAlmoayed TajAlsir MohamedSaeed; Supervised -  Albaraa Abuobieda Mohamed Ali

SUST Home
→
Theses and Dissertations
→
College of Computer Science and Information Technology
→
Masters Dissertations : Computer Science and Information Technology
→
View Item

Designing Broken Plurals Processing Method for Enhancing the Performance of Arabic Information Retrieval Systems

Mahmoud, MohamedAlmoayed TajAlsir MohamedSaeed; Supervised - Albaraa Abuobieda Mohamed Ali

URI: http://repository.sustech.edu/handle/123456789/12298

Date: 2015-11-04

Abstract:

Information Retrieval is one area of computer science highly associated with the field of the Internet. It concerned with the operations for indexing, searching and retrieving information and documents which are required by a user query. Search engines and E-library systems are examples of Information Retrieval System (IRS). IRS faces a fundamental challenges in some languages especially Arabic language because it is considered as a morphological language. A plurals in the Arabic language is divided into two types Sound Plurals (SP) and Broken Plurals (BP). IRS can identify the Sound plurals simply because it keeps the structure of words in its singular and plural form. Whereas IRS fails to recognize the BP because the structure of word is changed when the singular’s form of the word is derived from its plural form and vice-verse. In addition, this is reflected negatively when implementing indexing in Arabic IR. For instance, if a user typed a query contains plural form, system can retrieve all documents contain plurals form as the result, while system misses documents which contain singular form for the same word which should be retrieved. BP identification represent one of challenges faces Arabic IRS and causes loss of relevant documents; this is therefore lead to reduce Arabic IRS accuracy as a result. This study aims to explore how Arabic BP represent challenge faces Arabic IRS, and suggests a methodology based on the analysis of words to resolve BP identification problem and retrieval. The proposed method consists of three stages which are: Preprocess, BP identification, Query expansion. This study covers three patterns of syntax of Montaha Jemoa (SMJ) which are (Tfaaeel تفاعيل – Faaeel فعاعيل – Fyaeel فياعيل). Method Results were compared with (System baseline) before applying the proposed method and with (System baseline) after applying the proposed method. As a research findings, this study has successfully able to identify Broken Plural words and enhance retrieval and precision.