A high performance domain specific OCR for Bangla script

Hasnat, Md. Abul; Habib, S. M. Murtoza; Khan, Mumit

dc.contributor.author	Hasnat, Md. Abul
dc.contributor.author	Habib, S. M. Murtoza
dc.contributor.author	Khan, Mumit
dc.date.accessioned	2010-10-04T10:44:46Z
dc.date.available	2010-10-04T10:44:46Z
dc.date.copyright	2007
dc.date.issued	2007
dc.identifier.uri	http://hdl.handle.net/10361/327
dc.description	Includes bibliographical references (page 5).
dc.description.abstract	Research on recognizing Bengali script has been started since mid 1980’s. A variety of different techniques have been applied and the performance is examined. In this paper we present a high performance domain specific OCR for recognizing Bengali script. We select the training data set from the script of the specified domain. We choose Hidden Markov Model (HMM) for character classification due to its simple and straightforward way of representation. We examine the primary error types that mainly occurred at preprocessing level and carefully handled those errors by adding special error correcting module as a part of recognizer. Finally we added a dictionary and some error specific rules to correct the probable errors after the word formation is done. The entire technique significantly increases the performance of the OCR for a specific domain to a great extent.	en_US
dc.description.statementofresponsibility	Md. Abul Hasnat
dc.description.statementofresponsibility	S. M. Murtoza Habib
dc.description.statementofresponsibility	Mumit Khan
dc.format.extent	5 pages
dc.language.iso	en	en_US
dc.publisher	BRAC University	en_US
dc.subject	Optical character reader (OCR)
dc.subject	Bangla language processing
dc.title	A high performance domain specific OCR for Bangla script	en_US
dc.type	Article	en_US
dc.contributor.department	Center for Research on Bangla Language Processing (CRBLP), BRAC University

Files in this item

Name:: A HIGH PERFORMANCE DOMAIN ...
Size:: 311.8Kb
Format:: PDF

View/Open

This item appears in the following Collection(s)

Conference Papers (Centre for Research on Bangla Language Processing) [40]

Show simple item record