Multimodal emotion recognition from Speech and text using heterogeneous ensemble techniques

Rafidul Islam, Sheikh Md; Gomes, Maria; Hossain, Mehran; Raihana, Ramisha

View/Open

22141059, 22141070, 22141043, 21241077_CSE.pdf (1.145Mb)

Date

2022-05

Publisher

Brac University

Abstract

Emotion recognition and sentiment analysis serves many purposes from analyzing human behavior under specific conditions to enhancement of customer experience for various services. In this paper, a multimodal approach is used to identify 4 classes of emotions by combining both speech and text features to improve classification accuracy. The methodology involves the implementation of several models for both audio and text domains combined using 4 different heterogeneous ensemble tech niques - hard voting, soft voting, blending and stacking. The effects of the different ensemble learning methods on the accuracy for the multimodal classification task are also investigated. The results of this study show that stacking is the highest performing ensemble technique, and the implementation outperforms several exist ing methods for 4-class emotion detection on the IEMOCAP dataset, obtaining a weighted accuracy of 81.2%.

Keywords

Multimodal; Ensemble learning; Emotion recognition; Speech; Text; Stacking; IEMOCAP

LC Subject Headings

Emotions--Computer simulation; Emotions -- Computer simulation.

Description

This thesis is submitted in partial fulfillment of the requirements for the degree of Bachelor of Science in Computer Science, 2022.

Cataloged from PDF version of thesis.

Includes bibliographical references (pages 29-30).

Department

Department of Computer Science and Engineering, Brac University

Type

Thesis

Collections

Thesis & Report, BSc (Computer Science and Engineering) [1583]