Bangla speech recognition using 1D CNN and LSTM with different dimension reduction techniques

Nirjhor, S M Mahsanul Islam; Chowdhury, Mohammad Abidur Rahman; Sabab, Md. Nazmus

dc.contributor.advisor	Uddin, Jia
dc.contributor.author	Nirjhor, S M Mahsanul Islam
dc.contributor.author	Chowdhury, Mohammad Abidur Rahman
dc.contributor.author	Sabab, Md. Nazmus
dc.date.accessioned	2019-10-02T04:52:06Z
dc.date.available	2019-10-02T04:52:06Z
dc.date.copyright	2019
dc.date.issued	2019-08
dc.identifier.other	ID 14201031
dc.identifier.other	ID 15201049
dc.identifier.other	ID 16101135
dc.identifier.uri	http://hdl.handle.net/10361/12774
dc.description	This thesis is submitted in partial fulfillment of the requirements for the degree of Bachelor of Science in Computer Science, 2019.	en_US
dc.description	Cataloged from PDF version of thesis.
dc.description	Includes bibliographical references (pages 39-43).
dc.description.abstract	In the area of machine learning, speech recognition was always a hot topic but as world's 8th most widely spoken language Bangla hasn't got the focus as much as she deserved. This research will be on speech recognition using Bangla language dataset. The training model to recognize consists of 1 dimensional Convolutional Neural Network (CNN) and Long-Short Term Memory (LSTM). For feature extraction Mel-frequency Cepstral Coe cient (MFCC) and Mel Spectrogram has been used as the key features for the recognition task. MFCC alone gave an accuracy of 98% for 1d CNN. MFCC when used with LSTM gave an accuracy of 82.35%. Next dimensionality reduction technique was implemented Principal Component Analysis (PCA), Kernel-PCA (k-PCA) and T-distributed Stochastic Neighbor Embedding (t- SNE) transformation on MFCC and Mel Spectrogram for dimensionality reduction technique in a hope to obtain better as e ciency as possible. This is the rst attempt to implement these feature reduction methods on Bengali speech. Dimensionality reduction is a technique that is used to reduce large number of features into fewer factors which holds several advantages like reducing time and required storage space. After transformation using PCA a high consistent accuracy was obtained compared to k-PCA and t-SNE transformation (lowest in t-SNE). With PCA applied on MFCC coe cient the accuracy obtained was 94.54% for 1D CNN and 82.35% for LSTM. With t-SNE the accuracy obtained was 49% with 1D CNN and 50% with LSTM. We have also computed the Mel Spectrogram of the audio data after feeding it to model we obtain an accuracy of 90.74% for 1D CNN and 91.6% for LSTM. With k-PCA applied on Mel Spectrogram coe cient the accuracy obtained was 73.95% for 1D CNN and 72.27% for LSTM.	en_US
dc.description.statementofresponsibility	Mohammad Abidur Rahman Chowdhury
dc.description.statementofresponsibility	S M Mahsanul Islam Nirjhor
dc.description.statementofresponsibility	Md. Nazmus Sabab
dc.format.extent	43 pages
dc.language.iso	en	en_US
dc.publisher	Brac University	en_US
dc.rights	Brac University theses are protected by copyright. They may be viewed from this source for any purpose, but reproduction or distribution in any format is prohibited without written permission.
dc.subject	MFCC	en_US
dc.subject	PCA	en_US
dc.subject	Kernel PCA	en_US
dc.subject	t-SNE	en_US
dc.subject	1D CNN	en_US
dc.subject	RNN	en_US
dc.subject	LSTM	en_US
dc.subject.lcsh	Automatic speech recognition
dc.subject.lcsh	Machine learning
dc.title	Bangla speech recognition using 1D CNN and LSTM with different dimension reduction techniques	en_US
dc.type	Thesis	en_US
dc.contributor.department	Department of Computer Science and Engineering, Brac University
dc.description.degree	B. Computer Science

Files in this item

Name:: 14201031, 15201049, 16101135_C ...
Size:: 1.698Mb
Format:: PDF

View/Open

This item appears in the following Collection(s)

Thesis & Report, BSc (Computer Science and Engineering) [1402]

Show simple item record