dc.description.abstract | Most of the studies on speech emotion recognition have used single-language corpora,
but little research has been done in cross-language valence speech emotion recognition. Research has
shown that the models developed for single-language speech recognition systems perform poorly
when used in different environments. Cross-language speech recognition is a craving alternative, but it
is highly challenging because the corpora used will have been recorded in different environments and
under varying conditions. The differences in the quality of recording devices, elicitation techniques,
languages, and accents of speakers make the recognition task even more arduous. In this paper,
we propose a stacked ensemble learning algorithm to recognize valence emotion in a cross-language
speech environment. The proposed ensemble algorithm was developed from random decision
forest, AdaBoost, logistic regression, and gradient boosting machine and is therefore called RALOG.
In addition, we propose feature scaling using random forest recursive feature elimination and a
feature selection algorithm to boost the performance of RALOG. The algorithm has been evaluated
against four widely used ensemble algorithms to appraise its performance. The amalgam of five
benchmarked corpora has resulted in a cross-language corpus to validate the performance of RALOG
trained with the selected acoustic features. The comparative analysis results have shown that RALOG
gave better performance than the other ensemble learning algorithms investigated in this study. | en_ZW |