Abstract:
Human-computer interaction (HCI) has become one of the most challenging areas of research
in the field of artificial intelligent (AI) at the present time. Speech emotion recognition
(SER) introduces a new means of communication between humans and machines.
Enabling a machine to understand human emotion renders it more capable of understanding
the speech process. Despite the great progress and intensive research performed in
this area, there is still a lack of naturalness in identifying emotions. There is a need to fill
the gap between commercial interest and current performances. The key is to find significant
speech emotion features that can map emotion correctly and efficiently. The previous
works of SER extracted and selected different sets of acoustic features. However, the most
significant features have not yet been found. These problem is addressed in this research
by proposing a speech emotion recognition framework that provide an enhancement of
features extraction technique and hybrid feature selection method respectively. The voice
quality prosodic spectral-based feature extraction (VQPS) is implemented using prosodic
and spectral features extraction technique in addition to new and traditional voice quality
features extraction technique. At the same time, the balanced hybrid filter-based feature
selection (BHFFS) consists of two layers: the balancing layer; and the hybrid filter-based
layer. The proposed features extraction technique and selection method was successfully
experimented through the use of EMO-DB dataset. The experimental results proved that using
VQPS leads to performance improvement upon previous works. In addition, it demonstrates
that the voice quality features are important in developing the SER system. In the
same manner, BHFFS performance outperforms the previous work performance.