A Personalized Arabic Spam Detection Model

Mohammad, Asma Ibrahim Gamar Eldeen; Supervisor -  Izzeldain Mohammed Osman

SUST Home
→
Theses and Dissertations
→
College of Computer Science and Information Technology
→
PhD theses : Computer Science and Information Technology
→
View Item

dc.contributor.author	Mohammad, Asma Ibrahim Gamar Eldeen
dc.contributor.author	Supervisor - Izzeldain Mohammed Osman
dc.date.accessioned	2014-08-27T06:01:02Z
dc.date.available	2014-08-27T06:01:02Z
dc.date.issued	2014-05-01
dc.identifier.citation	Mohammad,Asma Ibrahim Gamar Eldeen.A Personalized Arabic Spam Detection Model/Asma Ibrahim Gamar Eldeen Mohammad;Izzeldain Mohammed Osman.-khartoum:Sudan University of science & Technology,computer science,2014.-178p.:ill.;28cm.-P.hd.	en_US
dc.identifier.uri	http://repository.sustech.edu/handle/123456789/6902
dc.description	Thesis	en_US
dc.description.abstract	In a free multicultural society a spam message is different from one user to another, i.e. certain content may be acceptable to one user but not be acceptable to another. So what is “unwanted” by one user may be liked by another user, what is classified as spam by one user at sometime may not be classified by the same user at other time. Therefore, there is a need to extend the standard spam filters to incorporate the different interests of the users and the changing interests of each user. In this thesis an attempt is made to extend the spam detection to follow the liking of the user. This is termed personalized spam detection. Thus the main objective of this work is to design a user personalized algorithm to detect English spam and modify it according to the complexity of Arabic language to detect Arabic and mixed (Arabic and English) spam emails. A dataset of Arabic emails which includes spam and non-spam is built. The data set is used to train Naïve Bayesian classifier to build Arabic spam detection model. Cross validation experiments are used to evaluate the model. A personalized spam detection web based, Permail, is developed and used for comparison against the spam filtering capabilities of Microsoft Hotmail, Google Gmail, and Yahoo Mail and to determine the effectiveness of spam filtering for each provider. The criteria used in the comparison are the quantity and percentage of spam in the Inbox. In this work three models are presented, the first one is an English spam detection model which uses a Naïve Bayesian algorithm where the model is trained using a large corpus of spam and non-spam messages and then tested using a standard dataset (From the Second Conference on Email and Anti-Spam CEAS 2005, Stanford University, Palo Alto, CA). The results are comparable to those obtained from other models. The model is then extended and modified to handle second model of Arabic and mixed (English and Arabic) data model. It is then tested against the Arabic corpus. A personalized web based spam detection system which was developed to provide a more personalized mail system to filter spam emails. Third model is personalized mail system (Permail). Which is classify spam message based on the behavioral of each user and it can provide a more personalized mail system to filter spam emails. The result of comparing performance of three classification techniques, Decision Tree J48, ZeroR, and Logistic Regression with the proposed Arabic spam detection shows the success criteria for text classification have significantly increased by using the proposed spam detection model. The result of using the corpus of the body of the message is better than that of the subject. The result of comparing the web based spam detection system with three known mail systems showed that the proposed system is the best one.	en_US
dc.description.sponsorship	Sudan University of Science & Technology	en_US
dc.language.iso	other	en_US
dc.publisher	Sudan University of science & Technology	en_US
dc.subject	Spam	en_US
dc.subject	A Personalized Arabic	en_US
dc.title	A Personalized Arabic Spam Detection Model	en_US
dc.title.alternative	نموذج عربي مشخصن لاكتشاف رسائل البريد الالكترونية غير المرغوبة	en_US
dc.type	Thesis	en_US