Abstract:
Sentiment analysis of the Arabic language has gained the attention of many researchers because of the increasing number of Arabic internet users, and the exponential growth of Arabic content online. Despite the language‟s popularity, there are limited annotated resources for sentiment analysis, including a comprehensive dataset, labelled corpora, and polarity lexica, and also reliable NLP tools. This is the source of motivation for the study - the need to develop an opinion corpus for text written in Arabic. Due to the complexity of the Arabic language, opinion mining and sentiment analysis can be quite difficult to construct. The work that exists in terms of sentiment analysis is limited to news and to blogs written in Modern Standard Arabic, while there have been few studies on social media content and web reviews written in the Arabic dialect. Moreover, most of the work done has been at the document and sentence level. This is compounded by the fact that in Arabic, different forms of the same word can have a variety of suffixes, affixes, and prefixes. Furthermore, different words with different meanings can be drawn from a common three-letter root. Unlike English, which has a rich morphological structure, the Arabic language has complex, varied structures, which can be more effectively handled using natural language processing. This thesis models an analysis of the sentiments in Arabic customer reviews, especially through Twitter. Particularly, it considers the task of aspect-oriented sentiment analysis, focusing on the two main subtasks which include (i) identifying relevant product aspects, and (ii) determining and classifying expressions of sentiment. The thesis experiments with dictionary-based methods, and several supervised approaches. For aspect detection, it casts the task as a terminology-extraction problem. With regards to sentiment analysis, detailed studies of sentiment lexicon acquisition and sentiment polarity classification are presented.
A set of Arabic language corpora from restaurants has been used to evaluate the proposed sentiment analysis methodologies. In addition, an Arabic Sentiment Classifier (TASC) has been implemented at the document-level which yield higher accuracy (88.00%) for SVM classifier. Feature selection, using Ontology and Information Gain, has also been used. The aspect-based Arabic Sentiment Analyzer (TASA) framework takes a collection of review texts, where the task‟s goal is to detect individual aspects that reviewers have commented on, and deciding whether the comments are positive or negative. For purposes of aspect selection, a hybrid approach is proposed, which combines the existing information gain technique (association rules), with the ontology base technique. The existing approach basically extracts frequent aspects of text. However, analysis shows that not all aspects occur frequently in the texts. Therefore, the hybrid technique has been proposed as a means for extracting both frequent and infrequent aspects. The proposed approach employs an
incremental technique for improving the performance of existing aspect selection, by extracting infrequent features through ontology with frequent features based on the lexical dictionary. The results show that the hybrid proposed technique outperforms existing techniques; which yield (75.9%) accuracy. The approaches used in this thesis have shown significant improvements, in comparison to relevant state-of-the-art methods, such as the lexicon-based or dictionary-based approach. It can be concluded that customer review mining systems can benefit from the methods proposed in this thesis.