SUST Repository

Design of Arabic Dialects Information Retrieval Model for Solving Regional Variation Problem

Show simple item record

dc.contributor.author Ahmed, Rayan Omer Mohamed
dc.contributor.author Supervised, Albaraa Abuobeida Mohamed Ali
dc.contributor.author Supervior - Albaraa Abuobeida Mohamed Ali
dc.date.accessioned 2016-02-21T10:04:38Z
dc.date.available 2016-02-21T10:04:38Z
dc.date.issued 2015-11-01
dc.identifier.citation Ahmed,Rayan Omer Mohamed. Design of Arabic Dialects Information Retrieval Model for Solving Regional Variation Problem / Rayan Omer Mohamed Ahmed ; Albaraa Abuobeida Mohamed Ali .- khartoum : Sudan University of Science and Technology , Collage of Computer Science and Information Technology , 2015 .- 87p. : ill . ; 28cm .- M.Sc. en_US
dc.identifier.uri http://repository.sustech.edu/handle/123456789/12808
dc.description thises en_US
dc.description.abstract Information retrieval (IR) is defined as an activity of satisfying the user's information needs from a collection of unstructured data (text, image, and video). One of disadvantage of most IR systems is that the search is based on query terms that entered by users. Then, when Arab user write the query using the term in his dialect or in Modern Stander Arabic (MSA) form, the documents were retrieved contained this query's term only. This problem appears clearly in scientific Arabic's documents, for illustration, the documents that show the compiler concept; it can be found written by the one of the following Arabic words: " ‫اٌجعِع‬ " , " ‫اٌّفغش‬ " or " ُ‫اٌّخشا‬ ". Thus, our research is focused on the Arabic language, as it is one of the widely spread languages with different dialects. We propose a pre-retrieval (offline) method to build a statistical based dictionary to expand the query which is based on a statistical methods (co-occurrence technique and Latent Semantic Analysis (LSA) model) which can be defined as a flexible approach because it is based on mathematical foundations to improve the effectiveness of the search result by retrieving the most relevant documents regardless of their dialect was used to formulate the queries. We designed and evaluated our method and the baseline methods from a small corpus collected manually using Google search engine. The evaluation was done using the average recall (Avg-R), average precision (Avg-P) and average F-measure (Avg-F). The result of our experiments indicated that the proposed method is a proven to be efficient for improving retrieval via expands the query by regional variation's synonyms, with accuracy 83% in form of Avg-F. Also, statistically our model is significant when it is compared to traditional IR systems by acquired 5.43594E-16 in the t-test. en_US
dc.description.sponsorship Sudan University of Science and Technology en_US
dc.language.iso en_US en_US
dc.publisher Sudan University of Science and Technology en_US
dc.subject Design of Arabic Dialects Information en_US
dc.subject Retrieval Model for Solving Regional Variation Problem en_US
dc.subject Design of Arabic Dialects Information Retrieval Model for Solving Regional Variation Problem en_US
dc.subject Computer Science and Information Technology en_US
dc.title Design of Arabic Dialects Information Retrieval Model for Solving Regional Variation Problem en_US
dc.title.alternative تصميم نموذج لاسترجاع معلومات اللهجات العربية لحل مشكلة التباين الاقليمي en_US
dc.type Thesis en_US


Files in this item

This item appears in the following Collection(s)

Show simple item record

Share

Search SUST


Browse

My Account