Segmentation of Bangla compound characters: underlying simple character detection from handwritten compound characters
Abstract
Bangla is one of the most popular languages in the world and more than 210 Million
people use it as their first or second language. The literature of Bangla has a
rich history and dates back thousands of years. However, Bangla characters have a
compound structure; some contain more than one simple character to form a single
compound character. There is a lot of work on character recognition but the
structure of the compound characters makes the detection of Bangla Compound
Characters a difficult task. The existing method on Bangla compound characters
uses a list of compound characters as the dataset, trains models on the whole image,
and detects the characters. Using this method on handwritten characters, the
accuracy decreases when the characters are slightly different from the train images
or the characters consist of two different simple characters that are not in the train
images. To overcome this problem, our research focus is to detect character type i.e.
simple or compound using VGG 16 architecture and YOLO, and if it is a compound
character, it can detect the underlying simple characters inside the compound characters.
To conduct our research, we created a new Bengali Handwritten character
dataset called “BanglaBorno” as the existing datasets had some limitations in the
quantity of compound characters or the quality of the images.