dc.contributor.advisor | Uddin, Jia | |
dc.contributor.author | Nirjhor, S M Mahsanul Islam | |
dc.contributor.author | Chowdhury, Mohammad Abidur Rahman | |
dc.contributor.author | Sabab, Md. Nazmus | |
dc.date.accessioned | 2019-10-02T04:52:06Z | |
dc.date.available | 2019-10-02T04:52:06Z | |
dc.date.copyright | 2019 | |
dc.date.issued | 2019-08 | |
dc.identifier.other | ID 14201031 | |
dc.identifier.other | ID 15201049 | |
dc.identifier.other | ID 16101135 | |
dc.identifier.uri | http://hdl.handle.net/10361/12774 | |
dc.description | This thesis is submitted in partial fulfillment of the requirements for the degree of Bachelor of Science in Computer Science, 2019. | en_US |
dc.description | Cataloged from PDF version of thesis. | |
dc.description | Includes bibliographical references (pages 39-43). | |
dc.description.abstract | In the area of machine learning, speech recognition was always a hot topic but as
world's 8th most widely spoken language Bangla hasn't got the focus as much as she
deserved. This research will be on speech recognition using Bangla language dataset.
The training model to recognize consists of 1 dimensional Convolutional Neural
Network (CNN) and Long-Short Term Memory (LSTM). For feature extraction
Mel-frequency Cepstral Coe cient (MFCC) and Mel Spectrogram has been used
as the key features for the recognition task. MFCC alone gave an accuracy of
98% for 1d CNN. MFCC when used with LSTM gave an accuracy of 82.35%. Next
dimensionality reduction technique was implemented Principal Component Analysis
(PCA), Kernel-PCA (k-PCA) and T-distributed Stochastic Neighbor Embedding (t-
SNE) transformation on MFCC and Mel Spectrogram for dimensionality reduction
technique in a hope to obtain better as e ciency as possible. This is the rst attempt
to implement these feature reduction methods on Bengali speech. Dimensionality
reduction is a technique that is used to reduce large number of features into fewer
factors which holds several advantages like reducing time and required storage space.
After transformation using PCA a high consistent accuracy was obtained compared
to k-PCA and t-SNE transformation (lowest in t-SNE). With PCA applied on MFCC
coe cient the accuracy obtained was 94.54% for 1D CNN and 82.35% for LSTM.
With t-SNE the accuracy obtained was 49% with 1D CNN and 50% with LSTM.
We have also computed the Mel Spectrogram of the audio data after feeding it to
model we obtain an accuracy of 90.74% for 1D CNN and 91.6% for LSTM. With
k-PCA applied on Mel Spectrogram coe cient the accuracy obtained was 73.95%
for 1D CNN and 72.27% for LSTM. | en_US |
dc.description.statementofresponsibility | Mohammad Abidur Rahman Chowdhury | |
dc.description.statementofresponsibility | S M Mahsanul Islam Nirjhor | |
dc.description.statementofresponsibility | Md. Nazmus Sabab | |
dc.format.extent | 43 pages | |
dc.language.iso | en | en_US |
dc.publisher | Brac University | en_US |
dc.rights | Brac University theses are protected by copyright. They may be viewed from this source for any purpose, but reproduction or distribution in any format is prohibited without written permission. | |
dc.subject | MFCC | en_US |
dc.subject | PCA | en_US |
dc.subject | Kernel PCA | en_US |
dc.subject | t-SNE | en_US |
dc.subject | 1D CNN | en_US |
dc.subject | RNN | en_US |
dc.subject | LSTM | en_US |
dc.subject.lcsh | Automatic speech recognition | |
dc.subject.lcsh | Machine learning | |
dc.title | Bangla speech recognition using 1D CNN and LSTM with different dimension reduction techniques | en_US |
dc.type | Thesis | en_US |
dc.contributor.department | Department of Computer Science and Engineering, Brac University | |
dc.description.degree | B. Computer Science | |