Abstract:
Due to the recent significant growth of e-commerce applications, most of the widely used
products are marketed online. This triggered online assessment of products. As such, the
success or failure of companies is partially measured by their ability to take assessments of
their products seriously. Analysis of these assessments is necessary for ensuring continuous
customer satisfaction and further improvements of current and future products. Naturally,
understanding the preferences of customers is crucial for product manufacturer as it helps
them in product development, marketing and consumer relationship management. On the
other hand, customers use of reviews by other’s online assessments influence their decision as
to whether or not they purchase a product. Expectedly, assessment are given in unstructured
texts of a natural language. Thus, their processing requires appropriate knowledge in different
domains that include, but are not limited to: database, information retrieval, information
extraction, machine learning, and natural language processing. However, it becomes difficult
for product manufacturers or dealers to keep track of large number of assessments, hence forth
will be called opinions and/or sentiments. In the past few years, researchers looked at different
ways of taking further advantage of opinions in what is now known as opinion mining or
sentiment analysis. The scope of opinion and sentiment includes characteristic, functionality
and features of product. This thesis is about novel methods that addresses challenges of
opinion mining of Arabic texts. To that end, a set of Arabic language corpora from hotel and
telecommunication companies has been collected. The set was developed for evaluating the
proposed sentiment analysis methodologies. As well, Arabic Sentiment Classifier (ASC) has
been implemented at the document-level. This research focuses on improvement of the
effectiveness of feature selection using Information Gain . It then proposes a generic
framework on for feature-based level
analysis. The Arabic Sentiment Analyzer (ASA)
framework consists of two main modules: a language resource construction and an opinion
miner. For the language resource construction module, the first phase proposes constructing
an opinion lexicon for Arabic opinion word. It is based on a bootstrapping process over an
online dictionary. A few seed sentiment words have been used for bootstrapping based on the
synonym and antonym structures of the dictionary. This method is simple and efficient as it
IVgives reasonable results. During the second phase, features of objects are extracted based on
frequent nouns, noun phrases, association rule mining and Natural Language Processing
(NLP) techniques. This phase takes advantage of syntactic patterns to improve the accuracy
of frequency- based techniques. Product features are stored in feature sets.
After a language resource is constructed, the opinion mining module uses a novel
information summarizing and visualization approach. The approach is based on NLP
techniques for defining sentiment sentences, identifying orientations of features and
summarizing results. The visualization module is aimed at providing users an effective way
of browsing the set of feature according to the polarity expressed by each assessments. In
piratical results reflect efficiency of the proposed system.