Developing a Content-Based Spam Detection Method

Ahmad Abdoh, Mousa Abdul Fattah; Supervisor - Mohammad Al Hafiz Mustafa

Please use this identifier to cite or link to this item: https://repository.sustech.edu/handle/123456789/8001

Full metadata record

DC Field	Value	Language
dc.contributor.author	Ahmad Abdoh, Mousa Abdul Fattah	-
dc.contributor.author	Supervisor - Mohammad Al Hafiz Mustafa
dc.date.accessioned	2014-11-12T12:22:28Z	-
dc.date.available	2014-11-12T12:22:28Z	-
dc.date.issued	2008-09	-
dc.identifier.citation	Ahmad Abdoh, Mousa Abdul Fattah. Developing a Content-Based Spam Detection Method/ Mousa Abdul Fattah Ahmad Abdoh؛ Mohammad Al Hafiz Mustafa.-Khartoum : sudan university of science and technology,computer science,2008.-79p:ill;28cm.M.Sc.	en_US
dc.identifier.uri	http://repository.sustech.edu/handle/123456789/8001	-
dc.description	Thesis	en_US
dc.description.abstract	The dramatically increasing number of email users, and the increasing number of free email providers, like yahoo, hotmail, gmail, increase the number of unwanted emails which is known as 'Spam emails'. The huge number of spam emails received daily by users account, made the necessity of existence of some sort of automated spam filters to detect and remove these unwanted emails. Several researchers have started working on automated techniques and tools that can be used to classify emails automatically into wanted) legitimate) or unwanted (spam) emails. Most of these filters are based on naïve Bayesian method. This thesis introduces a new automated filter based on naïve Bayesian. The basic idea of this filter is to give each word appears in emails a probabilistic value, this value indicates its probable belonging to spam. As there are many common words appear in spam as well as legitimate messages with the same rate, the proposed filter has a preprocessing component which removes all common words. The researcher carefully collected these common words. In the training phase a set of 1300 emails (legitimate and Spam) has been used. In this phase the weight of every uncommon word is estimated as the probability of a given word in spam email divided by the probability of the same word in legitimate email. In classification, a Bayesian classifier uses the weight table generated in the training phase to classify a given email as spam or legitimate. The proposed filter has been tested on a dataset of 400 emails, 200 of them are Spam and 200 of them are legitimate, the proposed algorithm succeeded in detecting 90% of the spam messages.	en_US
dc.description.sponsorship	Sudan University of Science and Technology	en_US
dc.language.iso	en_US	en_US
dc.publisher	Sudan University of Science and Technology	en_US
dc.subject	Spam Detection	en_US
dc.subject	Spam	en_US
dc.subject	Content-Based	en_US
dc.subject	unwanted emails	en_US
dc.subject	Spam emails	en_US
dc.subject	legitimate	en_US
dc.title	Developing a Content-Based Spam Detection Method	en_US
dc.title.alternative	تطوير طريقة للتعرف على الرسائل الإلكترونية غير المرغوب فيها بواسطة المحتوى	en_US
dc.type	Thesis	en_US
Appears in Collections:	Masters Dissertations : Computer Science and Information Technology

Files in This Item:

File	Description	Size	Format
Developing a Content - Based....pdf	Title	42.32 kB	Adobe PDF	View/Open
Abstract.pdf	Abstract	121.51 kB	Adobe PDF	View/Open
Reseach.pdf Restricted Access	Reseach	508.14 kB	Adobe PDF	View/Open Request a copy

Show simple item record

DSpace JSPUI

DSpace preserves and enables easy and open access to all types of digital content including text, images, moving images, mpegs and data sets