Bangla optical character recognition using Kohonen network

Shatil, Adnan Md. Shoeb

View/Open

Bangla Optical Character Recognition.pdf (242.7Kb)

Date

2006-05

Publisher

BRAC University

Abstract

This report is based on Optical Character Recognition (OCR) specially Bangla (National Language of Bangladesh) Optical Character Recognition process and its steps. The whole idea is converting text images into editable texts. In this report the raw data are offline printed characters. Theses characters are collected from computer images and also scanned documents. As no skew conversion and correction is considered in processing or preprocessing stages, so the documents are scanned with great care. After collecting the raw data, they are pre-processed with gray scale conversion and then black and white conversion. When we get the black and white image, there are only two types of data on that image. So we can consider it as a binary file. The character area is represented by 1 (one) and the rest of image area is represented with 0 (zero). On the next step we grab the whole image, take the binary data and placed it on to 25 X 25 pixels by pixel mapping procedures. Then we collect 625 bit long vector for each word or character. This vector is then trained with Kohonen neural network is considered as classification stage. So an optical character is recognized with Kohonen Network.

Description

This thesis report is submitted in partial fulfillment of the requirements for the degree of Bachelor of Science in Computer Science and Engineering, 2006.

Cataloged from PDF version of thesis report.

Includes bibliographical references (page 32).

Department

Department of Computer Science and Engineering, BRAC University

Type

Thesis

Collections

Thesis & Report, BSc (Computer Science and Engineering) [1589]