Design of Arabic Dialects Information Retrieval Model for Solving Regional Variation Problem

Ahmed, Rayan Omer Mohamed; Supervised, Albaraa Abuobeida Mohamed Ali; Supervior - Albaraa Abuobeida Mohamed Ali

SUST Home
→
Theses and Dissertations
→
College of Computer Science and Information Technology
→
Masters Dissertations : Computer Science and Information Technology
→
View Item

dc.contributor.author	Ahmed, Rayan Omer Mohamed
dc.contributor.author	Supervised, Albaraa Abuobeida Mohamed Ali
dc.contributor.author	Supervior - Albaraa Abuobeida Mohamed Ali
dc.date.accessioned	2016-02-21T10:04:38Z
dc.date.available	2016-02-21T10:04:38Z
dc.date.issued	2015-11-01
dc.identifier.citation	Ahmed,Rayan Omer Mohamed. Design of Arabic Dialects Information Retrieval Model for Solving Regional Variation Problem / Rayan Omer Mohamed Ahmed ; Albaraa Abuobeida Mohamed Ali .- khartoum : Sudan University of Science and Technology , Collage of Computer Science and Information Technology , 2015 .- 87p. : ill . ; 28cm .- M.Sc.	en_US
dc.identifier.uri	http://repository.sustech.edu/handle/123456789/12808
dc.description	thises	en_US
dc.description.abstract	Information retrieval (IR) is defined as an activity of satisfying the user's information needs from a collection of unstructured data (text, image, and video). One of disadvantage of most IR systems is that the search is based on query terms that entered by users. Then, when Arab user write the query using the term in his dialect or in Modern Stander Arabic (MSA) form, the documents were retrieved contained this query's term only. This problem appears clearly in scientific Arabic's documents, for illustration, the documents that show the compiler concept; it can be found written by the one of the following Arabic words: " ‫اٌجعِع‬ " , " ‫اٌّفغش‬ " or " ُ‫اٌّخشا‬ ". Thus, our research is focused on the Arabic language, as it is one of the widely spread languages with different dialects. We propose a pre-retrieval (offline) method to build a statistical based dictionary to expand the query which is based on a statistical methods (co-occurrence technique and Latent Semantic Analysis (LSA) model) which can be defined as a flexible approach because it is based on mathematical foundations to improve the effectiveness of the search result by retrieving the most relevant documents regardless of their dialect was used to formulate the queries. We designed and evaluated our method and the baseline methods from a small corpus collected manually using Google search engine. The evaluation was done using the average recall (Avg-R), average precision (Avg-P) and average F-measure (Avg-F). The result of our experiments indicated that the proposed method is a proven to be efficient for improving retrieval via expands the query by regional variation's synonyms, with accuracy 83% in form of Avg-F. Also, statistically our model is significant when it is compared to traditional IR systems by acquired 5.43594E-16 in the t-test.	en_US
dc.description.sponsorship	Sudan University of Science and Technology	en_US
dc.language.iso	en_US	en_US
dc.publisher	Sudan University of Science and Technology	en_US
dc.subject	Design of Arabic Dialects Information	en_US
dc.subject	Retrieval Model for Solving Regional Variation Problem	en_US
dc.subject	Design of Arabic Dialects Information Retrieval Model for Solving Regional Variation Problem	en_US
dc.subject	Computer Science and Information Technology	en_US
dc.title	Design of Arabic Dialects Information Retrieval Model for Solving Regional Variation Problem	en_US
dc.title.alternative	تصميم نموذج لاسترجاع معلومات اللهجات العربية لحل مشكلة التباين الاقليمي	en_US
dc.type	Thesis	en_US