Affective social anthropomorphic intelligent system

Mamun, Md. Adyelullahil; Abdullah, Hasnat Md.

dc.contributor.advisor	Alam, Md. Golam Rabiul
dc.contributor.author	Mamun, Md. Adyelullahil
dc.contributor.author	Abdullah, Hasnat Md.
dc.date.accessioned	2021-10-18T05:05:53Z
dc.date.available	2021-10-18T05:05:53Z
dc.date.copyright	2021
dc.date.issued	2021-01
dc.identifier.other	ID 20241044
dc.identifier.other	ID 20241047
dc.identifier.uri	http://hdl.handle.net/10361/15324
dc.description	This thesis is submitted in partial fulfillment of the requirements for the degree of Bachelor of Science in Computer Science and Engineering, 2021.	en_US
dc.description	Cataloged from PDF version of thesis.
dc.description	Includes bibliographical references (pages 48-56).
dc.description.abstract	At present, intelligent virtual assistants (IVA) are not only about delivering the functionalities and increasing their performances; they also need a socially interactive personality. As human conversational styles are measured by our sense of humor, personalities, tone of voice, these qualities have become essential for conversational intelligent virtual assistants. Our proposed system is an anthropomorphic intelligent system that can hold a proper human-like conversation with emotion and personality. It can also be able to imitate any person's voice given; voice audio data is available. Initially, the temporal audio wave data will be converted to frequency domain data (Mel-Spectrogram), which contains distinct patterns for audio features like the notes, pitch, rhythm, and melody. A parallel CNN, Transformer-Encoder, is used to predict the emotion from 7 different audio data classes. This audio is also fed to the deep-speech, an RNN model that consists of 5 hidden layers. From the spectrogram, it generates the text transcription. Then the transcript text is transferred to the multi-domain conversation agent, using blended skill talk and transformer-based retrieve-and-generate generation strategy and beam-search decoding an appropriate textual response is generated, which in turn gets synthesized to audio using WaveGlow that is based on WaveNet and Glow. It learns an invertible mapping of data to a latent space that can be manipulated and generates a Mel-spectrogram frame based on previous Mel-spectrogram frames. Finally, from the generated spectrogram, the waveform is generated using WaveGlow. A fine-tuned system can be used in the following but not limited to applications like dubbing, voice assistant, re-creating new movies with old actors.	en_US
dc.description.statementofresponsibility	Md. Adyelullahil Mamun
dc.description.statementofresponsibility	Hasnat Md. Abdullah
dc.format.extent	56 pages
dc.language.iso	en	en_US
dc.publisher	Brac University	en_US
dc.rights	Brac University theses are protected by copyright. They may be viewed from this source for any purpose, but reproduction or distribution in any format is prohibited without written permission.
dc.subject	IVA	en_US
dc.subject	NLP	en_US
dc.subject	SER	en_US
dc.subject	Emotion	en_US
dc.subject	Audio-Emotion	en_US
dc.subject	Personal-Assistant	en_US
dc.subject.lcsh	Intelligent control systems
dc.title	Affective social anthropomorphic intelligent system	en_US
dc.type	Thesis	en_US
dc.contributor.department	Department of Computer Science and Engineering, Brac University
dc.description.degree	B. Computer Science

Files in this item

Name:: 20241044, 20241047_CSE.pdf
Size:: 1.191Mb
Format:: PDF

View/Open

This item appears in the following Collection(s)

Thesis & Report, BSc (Computer Science and Engineering) [1480]

Show simple item record