Design of Arabic Dialects Information Retrieval Model for Solving Regional Variation Problem

Ahmed, Rayan Omer Mohamed; Supervised, Albaraa Abuobeida Mohamed Ali; Supervior - Albaraa Abuobeida Mohamed Ali

Please use this identifier to cite or link to this item: https://repository.sustech.edu/handle/123456789/12808

Title:	Design of Arabic Dialects Information Retrieval Model for Solving Regional Variation Problem
Other Titles:	تصميم نموذج لاسترجاع معلومات اللهجات العربية لحل مشكلة التباين الاقليمي
Authors:	Ahmed, Rayan Omer Mohamed Supervised, Albaraa Abuobeida Mohamed Ali Supervior - Albaraa Abuobeida Mohamed Ali
Keywords:	Design of Arabic Dialects Information Retrieval Model for Solving Regional Variation Problem Design of Arabic Dialects Information Retrieval Model for Solving Regional Variation Problem Computer Science and Information Technology
Issue Date:	1-Nov-2015
Publisher:	Sudan University of Science and Technology
Citation:	Ahmed,Rayan Omer Mohamed. Design of Arabic Dialects Information Retrieval Model for Solving Regional Variation Problem / Rayan Omer Mohamed Ahmed ; Albaraa Abuobeida Mohamed Ali .- khartoum : Sudan University of Science and Technology , Collage of Computer Science and Information Technology , 2015 .- 87p. : ill . ; 28cm .- M.Sc.
Abstract:	Information retrieval (IR) is defined as an activity of satisfying the user's information needs from a collection of unstructured data (text, image, and video). One of disadvantage of most IR systems is that the search is based on query terms that entered by users. Then, when Arab user write the query using the term in his dialect or in Modern Stander Arabic (MSA) form, the documents were retrieved contained this query's term only. This problem appears clearly in scientific Arabic's documents, for illustration, the documents that show the compiler concept; it can be found written by the one of the following Arabic words: " ‫اٌجعِع‬ " , " ‫اٌّفغش‬ " or " ُ‫اٌّخشا‬ ". Thus, our research is focused on the Arabic language, as it is one of the widely spread languages with different dialects. We propose a pre-retrieval (offline) method to build a statistical based dictionary to expand the query which is based on a statistical methods (co-occurrence technique and Latent Semantic Analysis (LSA) model) which can be defined as a flexible approach because it is based on mathematical foundations to improve the effectiveness of the search result by retrieving the most relevant documents regardless of their dialect was used to formulate the queries. We designed and evaluated our method and the baseline methods from a small corpus collected manually using Google search engine. The evaluation was done using the average recall (Avg-R), average precision (Avg-P) and average F-measure (Avg-F). The result of our experiments indicated that the proposed method is a proven to be efficient for improving retrieval via expands the query by regional variation's synonyms, with accuracy 83% in form of Avg-F. Also, statistically our model is significant when it is compared to traditional IR systems by acquired 5.43594E-16 in the t-test.
Description:	thises
URI:	http://repository.sustech.edu/handle/123456789/12808
Appears in Collections:	Masters Dissertations : Computer Science and Information Technology

Files in This Item:

File	Description	Size	Format
rayanOmer.pdf	Research	2.69 MB	Adobe PDF	View/Open

Show full item record

DSpace JSPUI

DSpace preserves and enables easy and open access to all types of digital content including text, images, moving images, mpegs and data sets