Automatic subtitle generation for Bengali multimedia using deep learning
Abstract
For audio or video material to be more inclusive and accessible, automatic subtitle
generation is essential. Nevertheless, implementing this technology into Bengali
presents significant challenges due to scarce resources and linguistic difficulty. In this
study, a new deep learning based system for creating Subtitles for Bengali multimedia
automatically is introduced. The suggested approach makes use of the Wav2vec2
and the Common Voice Bengali Dataset, a large collection of Bengali audio recordings.
This study uses the Common Voice Dataset Bengali to train and tune the
Wav2vec2 model in order to accurately convert Bengali audio into text. Current automatic
speech recognition approaches are combined with Bengali language-specific
factors in the created system to give accurate and reliable transcription works. The
transcribed text is synced with the matching audio parts throughout the subtitle
production process. The produced subtitles are enhanced using post-processing approaches,
similar to capitalization and punctuation restoration, to ensure readability
and consistency. The findings of this study might greatly improve Bengali language
media’s usability and availability across a range of sectors. The created subtitles
may enhance the watching experience for Bengali multimedia by easing greater understanding,
and expanding availability. The study demonstrates the potential of
using deep learning and ASR methods to get over the difficulties of automated
subtitle production in the Bengali language, advancing multimedia availability and
inclusion.