SUST Repository

A Personalized Arabic Spam Detection Model

Show simple item record

dc.contributor.author Mohammad, Asma Ibrahim Gamar Eldeen
dc.contributor.author Supervisor - Izzeldain Mohammed Osman
dc.date.accessioned 2014-08-27T06:01:02Z
dc.date.available 2014-08-27T06:01:02Z
dc.date.issued 2014-05-01
dc.identifier.citation Mohammad,Asma Ibrahim Gamar Eldeen.A Personalized Arabic Spam Detection Model/Asma Ibrahim Gamar Eldeen Mohammad;Izzeldain Mohammed Osman.-khartoum:Sudan University of science & Technology,computer science,2014.-178p.:ill.;28cm.-P.hd. en_US
dc.identifier.uri http://repository.sustech.edu/handle/123456789/6902
dc.description Thesis en_US
dc.description.abstract In a free multicultural society a spam message is different from one user to another, i.e. certain content may be acceptable to one user but not be acceptable to another. So what is “unwanted” by one user may be liked by another user, what is classified as spam by one user at sometime may not be classified by the same user at other time. Therefore, there is a need to extend the standard spam filters to incorporate the different interests of the users and the changing interests of each user. In this thesis an attempt is made to extend the spam detection to follow the liking of the user. This is termed personalized spam detection. Thus the main objective of this work is to design a user personalized algorithm to detect English spam and modify it according to the complexity of Arabic language to detect Arabic and mixed (Arabic and English) spam emails. A dataset of Arabic emails which includes spam and non-spam is built. The data set is used to train Naïve Bayesian classifier to build Arabic spam detection model. Cross validation experiments are used to evaluate the model. A personalized spam detection web based, Permail, is developed and used for comparison against the spam filtering capabilities of Microsoft Hotmail, Google Gmail, and Yahoo Mail and to determine the effectiveness of spam filtering for each provider. The criteria used in the comparison are the quantity and percentage of spam in the Inbox. In this work three models are presented, the first one is an English spam detection model which uses a Naïve Bayesian algorithm where the model is trained using a large corpus of spam and non-spam messages and then tested using a standard dataset (From the Second Conference on Email and Anti-Spam CEAS 2005, Stanford University, Palo Alto, CA). The results are comparable to those obtained from other models. The model is then extended and modified to handle second model of Arabic and mixed (English and Arabic) data model. It is then tested against the Arabic corpus. A personalized web based spam detection system which was developed to provide a more personalized mail system to filter spam emails. Third model is personalized mail system (Permail). Which is classify spam message based on the behavioral of each user and it can provide a more personalized mail system to filter spam emails. The result of comparing performance of three classification techniques, Decision Tree J48, ZeroR, and Logistic Regression with the proposed Arabic spam detection shows the success criteria for text classification have significantly increased by using the proposed spam detection model. The result of using the corpus of the body of the message is better than that of the subject. The result of comparing the web based spam detection system with three known mail systems showed that the proposed system is the best one. en_US
dc.description.sponsorship Sudan University of Science & Technology en_US
dc.language.iso other en_US
dc.publisher Sudan University of science & Technology en_US
dc.subject Spam en_US
dc.subject A Personalized Arabic en_US
dc.title A Personalized Arabic Spam Detection Model en_US
dc.title.alternative نموذج عربي مشخصن لاكتشاف رسائل البريد الالكترونية غير المرغوبة en_US
dc.type Thesis en_US


Files in this item

This item appears in the following Collection(s)

Show simple item record

Share

Search SUST


Browse

My Account