SUST Repository

Improving Stemming Algorithm for Arabic Text Search

Show simple item record

dc.contributor.author Babiker, Afag Salah Aldeen
dc.contributor.author Supervisor - Mohammed Mustafa Ali
dc.date.accessioned 2015-02-12T12:04:03Z
dc.date.available 2015-02-12T12:04:03Z
dc.date.issued 2014-08
dc.identifier.citation Babiker, Afag Salah Aldeen. Improving Stemming Algorithm for Arabic Text Search/ Afag Salah Aldeen Babiker؛. Mohammed Mustafa Ali.-Khartoum : sudan university of science and technology,network,2014.-88p:ill;28cm.M.Sc. en_US
dc.identifier.uri http://repository.sustech.edu/handle/123456789/10497
dc.description Thesis en_US
dc.description.abstract Building an effective stemmer for Arabic language has always been a hot research topic in the IR field. This is because Arabic, as one of the Semitic languages, has a very rich and complex morphology. From that perspective, several approaches have been developed for Arabic stemming and for the analysis of the best approach to index Arabic words. Formally, Arabic stemming techniques can be also classified into two major techniques: root-based techniques (known also as heavy or morphological analysis based stemming) and light stemming-based techniques (known also as affix removal stemming. Each of two approaches has major drawbacks. On one hand, root-based stemming may result in an over-stemming problem, in which words with different meanings may erroneously, grouped together. On the other hand, light-based stemming often results in an under-stemming problem, in which words with the same meaning do not stemmed together. Nevertheless, it was concluded in IR light stemming and light-10 in particular is the best developed approach for indexing Arabic documents. Inspired by light-10, this research attempts to improve some of the drawbacks identified in light-10 stemmer. It simply adds some additional prefixes and suffixes. These extended prefixes have been added after a deep analysis and several experiments conducted by the developer to understand the nature of the Arabic words. The step has been also accompanied by developing a new algorithm, also inspired by light-10, so as to control the process of determining which prefix and/or suffix should be stripped off. Test results showed that the proposed Extended-10 stemmer could yield significant better results when it was compared to the best known Arabic stemmer so far, that is light-10. Results also prove to be efficient for improving Arabic IR retrieval. en_US
dc.description.sponsorship Sudan University Science and Technology en_US
dc.language.iso en_US en_US
dc.publisher Sudan University of Science and Technology en_US
dc.subject Stemming Algorithm en_US
dc.subject Arabic Text Search en_US
dc.subject Arabic language en_US
dc.subject Stemming en_US
dc.subject Over-Stemming problem en_US
dc.subject Light 10 en_US
dc.title Improving Stemming Algorithm for Arabic Text Search en_US
dc.type Thesis en_US


Files in this item

This item appears in the following Collection(s)

Show simple item record

Share

Search SUST


Browse

My Account