Improving Stemming Algorithm for Arabic Text Search

Babiker, Afag Salah Aldeen; Supervisor -  Mohammed Mustafa Ali

SUST Home
→
Theses and Dissertations
→
College of Computer Science and Information Technology
→
Masters Dissertations : Computer Science and Information Technology
→
View Item

dc.contributor.author	Babiker, Afag Salah Aldeen
dc.contributor.author	Supervisor - Mohammed Mustafa Ali
dc.date.accessioned	2015-02-12T12:04:03Z
dc.date.available	2015-02-12T12:04:03Z
dc.date.issued	2014-08
dc.identifier.citation	Babiker, Afag Salah Aldeen. Improving Stemming Algorithm for Arabic Text Search/ Afag Salah Aldeen Babiker؛. Mohammed Mustafa Ali.-Khartoum : sudan university of science and technology,network,2014.-88p:ill;28cm.M.Sc.	en_US
dc.identifier.uri	http://repository.sustech.edu/handle/123456789/10497
dc.description	Thesis	en_US
dc.description.abstract	Building an effective stemmer for Arabic language has always been a hot research topic in the IR field. This is because Arabic, as one of the Semitic languages, has a very rich and complex morphology. From that perspective, several approaches have been developed for Arabic stemming and for the analysis of the best approach to index Arabic words. Formally, Arabic stemming techniques can be also classified into two major techniques: root-based techniques (known also as heavy or morphological analysis based stemming) and light stemming-based techniques (known also as affix removal stemming. Each of two approaches has major drawbacks. On one hand, root-based stemming may result in an over-stemming problem, in which words with different meanings may erroneously, grouped together. On the other hand, light-based stemming often results in an under-stemming problem, in which words with the same meaning do not stemmed together. Nevertheless, it was concluded in IR light stemming and light-10 in particular is the best developed approach for indexing Arabic documents. Inspired by light-10, this research attempts to improve some of the drawbacks identified in light-10 stemmer. It simply adds some additional prefixes and suffixes. These extended prefixes have been added after a deep analysis and several experiments conducted by the developer to understand the nature of the Arabic words. The step has been also accompanied by developing a new algorithm, also inspired by light-10, so as to control the process of determining which prefix and/or suffix should be stripped off. Test results showed that the proposed Extended-10 stemmer could yield significant better results when it was compared to the best known Arabic stemmer so far, that is light-10. Results also prove to be efficient for improving Arabic IR retrieval.	en_US
dc.description.sponsorship	Sudan University Science and Technology	en_US
dc.language.iso	en_US	en_US
dc.publisher	Sudan University of Science and Technology	en_US
dc.subject	Stemming Algorithm	en_US
dc.subject	Arabic Text Search	en_US
dc.subject	Arabic language	en_US
dc.subject	Stemming	en_US
dc.subject	Over-Stemming problem	en_US
dc.subject	Light 10	en_US
dc.title	Improving Stemming Algorithm for Arabic Text Search	en_US
dc.type	Thesis	en_US