Development of Recommender System Model for Hotel Selection Based on Arabic Sentiment Analysis

Gad, Yousra Faisal; Supervisor, -Amir Hussain

SUST Home
→
Theses and Dissertations
→
College of Computer Science and Information Technology
→
PhD theses : Computer Science and Information Technology
→
View Item

Development of Recommender System Model for Hotel Selection Based on Arabic Sentiment Analysis

Gad, Yousra Faisal; Supervisor, -Amir Hussain

URI: http://repository.sustech.edu/handle/123456789/26002

Date: 2020-08-12

Abstract:

Recommender System can be defined as one of information filtering technologies that generate suggestions to a group of users for items that might interest them. These suggestions relate to various decision-making processes, such as what items to buy, what music to listen to, or what online news to read. Recommender System is divided into two types personalizedrecommender system for regular users and non-personalized recommender system for new users. However, both types suffer from many obstacles that can affect the performance of a systemsuch as cold start problem and rating bias problem. Cold start is a common issue that the system cannot produce any suggestion for a new user or item that has no background or history. Rating bias is for unique taste users which rarely give any feedback about product or services. Therefore, this research focuses on developing models that depend on sentiment analysis to overcome the aforementioned issues. Hotelsdataset with 3,240reviews is collected and used as a case study. A pre-processing stage is conducted before using the dataset for accomplishing the mining and analysis task. A new Arabic lexicon was built which consists of 2,923 sentimentsextracted from dataset and been assign with polarity manually. Each review is label with positive/ negative using Arabic lexicon.Machine learningtechniques are used like Support Vector Machine, Logistic Regression, Decision Tree, Random Forest and Recurrent Neural Networks applied for classification of the reviews. Results show both Support Vector Machine, LogisticRegression have the highest accuracy with 89%, while Decision Tree, Random Forest and Recurrent Neural Networks equal 85%,80% and 79% respectively. Sentiment strength is calculated by summing the frequent of positive and negative words in reviews. A fuzzy rule is used Sentiment strength to predicate rating stars. As a result, non–personalized recommendation is presented by calculating the highest average of rating stars hotels. For understanding the preferences of the user who wrote the review, a user profile is created by using Aspect-base sentiment analysis. Aspect terms are extracted and organized into 6 categories and used to label each review with proper category. These labels are used for personalized recommender system by match similar users based on their category. Multi classifiers are used to evaluate the accuracy which is Support Vector Machine, Logistic Regression, Naïve Bayes, Random Forest and Recurrent Neural Networks. The classification results show that Naïve Bayes has the highest accuracy with 82.4% while Support Vector Machine, Logistic Regression, Random Forest and Recurrent Neural Networks are equal 68.2%, 72.2%, 52% and 55.25% respectively. To evaluate the proposed model, a tested dataset with given rating data is used inside the model, then calculate liner correlation coefficient to find relationship between actual rating data and predicated rating data. The experiment results show the proposed model has 0.98% accuracy, which prove that sentiment analysis can solve both cold start and rating bias problems.