Abstract:
The growing problem of unsolicited bulk email known as
spam has generated an increasing need for reliable anti-spam
filters.
Filters of this type have so far been based mostly on manually
constructed keyword patterns. Recently a Naïve Bayesian
classifier has been trained to detect spam messages
automatically.
To improve the performance of the automated anti-spam
filters this research introduces:
1- a new feature selection method, the Multi-Phase Feature
Selection Method.
2- a new alternative feature weighting function
3- a simple classification algorithm, the Mean of the
Feature Weighting Classification Algorithm.
The introduced approaches are analyzed theoretically.
Experiments were conducted using 1150 email messages to
compare the new methods to previous published methods of
Sahami et al. and I. Androutsopoulos et al. and the results
were overall comparable.