Please use this identifier to cite or link to this item: https://repository.sustech.edu/handle/123456789/10497
Full metadata record
DC FieldValueLanguage
dc.contributor.authorBabiker, Afag Salah Aldeen
dc.contributor.authorSupervisor - Mohammed Mustafa Ali
dc.date.accessioned2015-02-12T12:04:03Z
dc.date.available2015-02-12T12:04:03Z
dc.date.issued2014-08
dc.identifier.citationBabiker, Afag Salah Aldeen. Improving Stemming Algorithm for Arabic Text Search/ Afag Salah Aldeen BabikerØ›. Mohammed Mustafa Ali.-Khartoum : sudan university of science and technology,network,2014.-88p:ill;28cm.M.Sc.en_US
dc.identifier.urihttp://repository.sustech.edu/handle/123456789/10497
dc.descriptionThesisen_US
dc.description.abstractBuilding an effective stemmer for Arabic language has always been a hot research topic in the IR field. This is because Arabic, as one of the Semitic languages, has a very rich and complex morphology. From that perspective, several approaches have been developed for Arabic stemming and for the analysis of the best approach to index Arabic words. Formally, Arabic stemming techniques can be also classified into two major techniques: root-based techniques (known also as heavy or morphological analysis based stemming) and light stemming-based techniques (known also as affix removal stemming. Each of two approaches has major drawbacks. On one hand, root-based stemming may result in an over-stemming problem, in which words with different meanings may erroneously, grouped together. On the other hand, light-based stemming often results in an under-stemming problem, in which words with the same meaning do not stemmed together. Nevertheless, it was concluded in IR light stemming and light-10 in particular is the best developed approach for indexing Arabic documents. Inspired by light-10, this research attempts to improve some of the drawbacks identified in light-10 stemmer. It simply adds some additional prefixes and suffixes. These extended prefixes have been added after a deep analysis and several experiments conducted by the developer to understand the nature of the Arabic words. The step has been also accompanied by developing a new algorithm, also inspired by light-10, so as to control the process of determining which prefix and/or suffix should be stripped off. Test results showed that the proposed Extended-10 stemmer could yield significant better results when it was compared to the best known Arabic stemmer so far, that is light-10. Results also prove to be efficient for improving Arabic IR retrieval.en_US
dc.description.sponsorshipSudan University Science and Technologyen_US
dc.language.isoen_USen_US
dc.publisherSudan University of Science and Technologyen_US
dc.subjectStemming Algorithmen_US
dc.subjectArabic Text Searchen_US
dc.subjectArabic languageen_US
dc.subjectStemmingen_US
dc.subjectOver-Stemming problemen_US
dc.subjectLight 10en_US
dc.titleImproving Stemming Algorithm for Arabic Text Searchen_US
dc.typeThesisen_US
Appears in Collections:Masters Dissertations : Computer Science and Information Technology

Files in This Item:
File Description SizeFormat 
Improving Stemming Algorithm .pdfResearch1.46 MBAdobe PDFView/Open


Items in DSpace are protected by copyright, with all rights reserved, unless otherwise indicated.