Please use this identifier to cite or link to this item:
https://repository.sustech.edu/handle/123456789/12808
Title: | Design of Arabic Dialects Information Retrieval Model for Solving Regional Variation Problem |
Other Titles: | تصميم نموذج لاسترجاع معلومات اللهجات العربية لحل مشكلة التباين الاقليمي |
Authors: | Ahmed, Rayan Omer Mohamed Supervised, Albaraa Abuobeida Mohamed Ali Supervior - Albaraa Abuobeida Mohamed Ali |
Keywords: | Design of Arabic Dialects Information Retrieval Model for Solving Regional Variation Problem Design of Arabic Dialects Information Retrieval Model for Solving Regional Variation Problem Computer Science and Information Technology |
Issue Date: | 1-Nov-2015 |
Publisher: | Sudan University of Science and Technology |
Citation: | Ahmed,Rayan Omer Mohamed. Design of Arabic Dialects Information Retrieval Model for Solving Regional Variation Problem / Rayan Omer Mohamed Ahmed ; Albaraa Abuobeida Mohamed Ali .- khartoum : Sudan University of Science and Technology , Collage of Computer Science and Information Technology , 2015 .- 87p. : ill . ; 28cm .- M.Sc. |
Abstract: | Information retrieval (IR) is defined as an activity of satisfying the user's information needs from a collection of unstructured data (text, image, and video). One of disadvantage of most IR systems is that the search is based on query terms that entered by users. Then, when Arab user write the query using the term in his dialect or in Modern Stander Arabic (MSA) form, the documents were retrieved contained this query's term only. This problem appears clearly in scientific Arabic's documents, for illustration, the documents that show the compiler concept; it can be found written by the one of the following Arabic words: " اٌجعِع " , " اٌّفغش " or " ُاٌّخشا ". Thus, our research is focused on the Arabic language, as it is one of the widely spread languages with different dialects. We propose a pre-retrieval (offline) method to build a statistical based dictionary to expand the query which is based on a statistical methods (co-occurrence technique and Latent Semantic Analysis (LSA) model) which can be defined as a flexible approach because it is based on mathematical foundations to improve the effectiveness of the search result by retrieving the most relevant documents regardless of their dialect was used to formulate the queries. We designed and evaluated our method and the baseline methods from a small corpus collected manually using Google search engine. The evaluation was done using the average recall (Avg-R), average precision (Avg-P) and average F-measure (Avg-F). The result of our experiments indicated that the proposed method is a proven to be efficient for improving retrieval via expands the query by regional variation's synonyms, with accuracy 83% in form of Avg-F. Also, statistically our model is significant when it is compared to traditional IR systems by acquired 5.43594E-16 in the t-test. |
Description: | thises |
URI: | http://repository.sustech.edu/handle/123456789/12808 |
Appears in Collections: | Masters Dissertations : Computer Science and Information Technology |
Files in This Item:
File | Description | Size | Format | |
---|---|---|---|---|
rayanOmer.pdf | Research | 2.69 MB | Adobe PDF | View/Open |
Items in DSpace are protected by copyright, with all rights reserved, unless otherwise indicated.