Abstract:
Social networks have become one of the important daily activities in our life. A huge volume of comments is daily generated in social networks. Colloquial Arabic comments have become more widely used between the people’s in social networks. Therefore, sentiment analysis of colloquial Arabic comments has become very interesting. There are recognized challenges in this field; some of which are inherited from the nature of the Arabic language itself such as using word “جميل” to express the name of a person and the same word may express feeling. While other problems are derived from the scarcity of tools and sources. This thesis considered sentiment analysis of Arabic tweets which are written in Most Standard Arabic or Sudanese dialectical Arabic. A new lexicon of Sudanese dialect was built which consists of 2500 sentiments. Machine learning techniques which are Support Vector Machine, Naive Bayes, K-Nearest Neighbor and Decision Tree were applied to detect the polarity of the tweets. The results of the first experiment show that, SVM achieved the best Accuracy, Recall and F-measure and it equals 95.1%, 76.5% and 84.4% respectively. While Naïve Bayes achieved best Precision and it equals to 85.1%. The results of the second experiment show that, SVM achieved the best Accuracy and F-Measure and it equals 75.2%, 83.9% respectively. While Naive Bayes achieved best Precision and it equals 88.41%. Also, the best Recall was achieved by Decision Tree and it equals 99.9%. In addition, the percentages of positive and negative opinions toward the Sudanese government services was calculated. 9.4% represents positive opinions related the government services, while 90.6% represents negative opinion.