Bengali isolated speech recognition : a comparative analysis of the effects of data augmentation on HMM and DNN based acoustic models

Rashid, Warida; Reza, Mohi

View/Open

14301026,14101040_CSE.pdf (747.7Kb)

Date

2017

Publisher

BRAC Univeristy

Abstract

We have created an isolated-word dataset - Prodorshok 1, which consists of 34 Bengali words related to navigation with 1011 voice samples. The word set is intended to help design speaker dependent/independent, voice-command driven automated speech recognition (ASR) systems that can potentially improve human-computer interaction. This paper presents the results of an objective analysis that was undertaken using a subset of words from Prodorshok I to help assess its reliability in ASR systems that utilize Hidden Markov Models (HMM) with Gaussian emissions and Deep Neural Networks (DNN). The results show that simple data augmentation involving a small pitch shift can make surprisingly tangible improvements to accuracy levels in speech recognition, even when working with small datasets. Prodorshok I will be expanded upon and made publicly available for others to use under an Open Data License (ODbL).

Keywords

Data augmentation; Speech recognition

Description

This thesis report is submitted in partial fulfilment of the requirements for the degree of Bachelor of Science in Computer Science and Engineering, 2017.

Cataloged from PDF version of thesis report.

Includes bibliographical references (pages 31-33).

Department

Department of Computer Science and Engineering, BRAC University

Type

Thesis

Collections

Thesis & Report, BSc (Computer Science and Engineering) [1267]