Identifying Broken Plural in ArabicInformation Retrieval Systems

Ahmed, Lojain Abdalhakeem; Osman, Manal Alshazali; Supervised, Ebtihal Mustafa Alameen

SUST Home
→
Graduation Projects
→
Bachelor
→
Bachelor of Computer Science and Information Technology
→
View Item

Identifying Broken Plural in ArabicInformation Retrieval Systems

Ahmed, Lojain Abdalhakeem; Osman, Manal Alshazali; Supervised, Ebtihal Mustafa Alameen

URI: http://repository.sustech.edu/handle/123456789/15544

Date: 2016-10-01

Abstract:

Arabic Language is one of the most widespread languages in the world and it’s newly associated with the field of the internet, so information retrieval is one of the most important fields in computer science. It is concerned with operations like indexing, searching and retrieving information, which is required by the user. Search engines are examples of information retrieval system (IRS). IRS faces many challenges when searching with Arabic language, because it is a grammatical language. Plurals in Arabic language are divided to two types Regular Plurals (RP) and Irregular/Broken Plurals (BR), IRS can identify regular plural, because it maintains the basic structure of the word, but it fails to identify BP, because the basic structure of the word changes from singular form to plural form and vice versa and that reflects negativities when applying indexing operation in IRS; because if a user types a query that contains BP, the system retrieves only the documents that contain the plural form while losing the documents that contain the singular form that should also be retrieved. Identifying BP is also one of the challenges that face Arabic IRS and it causes document loss leading to inaccurate results. This study aims at explaining how big of a challenge BP is to Arabic IRS. This study proposes a method to recognize BP and to increase Recall without affecting Precision. Proposed method consists of five stages (pattern recognition – word recognition – singular candidates – selecting the right singular form – expanding the query). This study covers only one pattern of BP patterns which is (فعاليل). Method results were compared with System baseline before and after applying the proposed method. Based on these results this study has successfully identified BP and enhanced retrieval.