Bangla optical character recognition using Kohonen network
AuthorShatil, Adnan Md. Shoeb
MetadataShow full item record
This report is based on Optical Character Recognition (OCR) specially Bangla (National Language of Bangladesh) Optical Character Recognition process and its steps. The whole idea is converting text images into editable texts. In this report the raw data are offline printed characters. Theses characters are collected from computer images and also scanned documents. As no skew conversion and correction is considered in processing or preprocessing stages, so the documents are scanned with great care. After collecting the raw data, they are pre-processed with gray scale conversion and then black and white conversion. When we get the black and white image, there are only two types of data on that image. So we can consider it as a binary file. The character area is represented by 1 (one) and the rest of image area is represented with 0 (zero). On the next step we grab the whole image, take the binary data and placed it on to 25 X 25 pixels by pixel mapping procedures. Then we collect 625 bit long vector for each word or character. This vector is then trained with Kohonen neural network is considered as classification stage. So an optical character is recognized with Kohonen Network.