Abstract:
Banks deal with huge amounts of customer's data and thus needs tremendous
efforts to improve the understanding of the accumulated data in order to detect
customer's behavior and accordingly enable the executive mangers to make the right
decision and avoid any possible losses, wasting time and effort .
The main aim of this thesis is to distinguish between borrowers who pay back
loan from those who don’t . therefore the executive mangers can easily reduce the
costs of non-payment borrowers and decrease the high number of bad loans in order
to serve the bank and its customers by using data mining techniques.
The dataset of this research was obtained from the UCI machine learning
repository website. In order to improve the accuracy of our classification and gain
useful results some preprocessing techniques were applied such as : removed any
irrelevant and correlated data , implemented data discretization ,data cleaning ,and
target class balancing as well to achieve a suitable dataset for our Algorithms. Then
five data mining classification techniques were conducted which are: Naive bayes ,
J48, IBK, Multilayer Perceptron (MLP) and Sequential minimal optimization
(SMO).The Weka software from Waikato university with (10-cross validation) was
used to model and validate the proposed models.
Experiments in this research were conducted in two stages. Firstly, J48
classifier was applied on full dataset, the results carried out in this stage show that:
applying of the preprocessing techniques on the data set improved the performance of
the classifier. Secondly, five classification techniques were applied to the
preprocessed datasets. The results carried out in this stage showed that the
performance of the five classification algorithms are nearly same . Out of these five
classification algorithms, J48classifier had the highest accuracy (84.35%) .