Bangla optical character recognition using Kohonen network
Abstract
This report is based on Optical Character Recognition (OCR) specially Bangla (National
Language of Bangladesh) Optical Character Recognition process and its steps. The whole
idea is converting text images into editable texts. In this report the raw data are offline
printed characters. Theses characters are collected from computer images and also
scanned documents. As no skew conversion and correction is considered in processing or
preprocessing stages, so the documents are scanned with great care. After collecting the
raw data, they are pre-processed with gray scale conversion and then black and white
conversion. When we get the black and white image, there are only two types of data on
that image. So we can consider it as a binary file. The character area is represented by 1
(one) and the rest of image area is represented with 0 (zero). On the next step we grab the
whole image, take the binary data and placed it on to 25 X 25 pixels by pixel mapping
procedures. Then we collect 625 bit long vector for each word or character. This vector is
then trained with Kohonen neural network is considered as classification stage. So an
optical character is recognized with Kohonen Network.