Bengali isolated speech recognition : a comparative analysis of the effects of data augmentation on HMM and DNN based acoustic models
Abstract
We have created an isolated-word dataset - Prodorshok 1, which consists of 34 Bengali words related to navigation with 1011 voice samples. The word set is intended to help design speaker dependent/independent, voice-command driven automated speech recognition (ASR) systems that can potentially improve human-computer interaction. This paper presents the results of an objective analysis that was undertaken using a subset of words from Prodorshok I to help assess its reliability in ASR systems that utilize Hidden Markov Models (HMM) with Gaussian emissions and Deep Neural Networks (DNN). The results show that simple data augmentation involving a small pitch shift can make surprisingly tangible improvements to accuracy levels in speech recognition, even when working with small datasets. Prodorshok I will be expanded upon and made publicly available for others to use under an Open Data License (ODbL).