Integrating Bangla script recognition support in tesseract OCR

dc.contributor.author	Hasnat, Md. Abul
dc.contributor.author	Chowdhury, Muttakinur Rahman
dc.contributor.author	Khan, Mumit
dc.date.accessioned	2010-10-25T06:03:34Z
dc.date.available	2010-10-25T06:03:34Z
dc.date.copyright	2009
dc.date.issued	2009
dc.identifier.uri	http://hdl.handle.net/10361/635
dc.description	Includes bibliographical references (page 5).
dc.description.abstract	Tesseract is considered one of the most accurate free software OCR engines currently available. It was originally developed by Hewlett-Packard from 1985 until 1995, and is currently maintained by Google. At present, Tesseract is capable of only recognizing English, French, Italian, German, Spanish and Dutch. However, it is possible to make Tesseract recognize other scripts if the engine is trained with the requisite data. In this paper, we present a complete methodology to integrate Bangla script recognition support in Tesseract.	en_US
dc.description.statementofresponsibility	Md. Abul Hasnat
dc.description.statementofresponsibility	Muttakinur Rahman Chowdhury
dc.description.statementofresponsibility	Mumit Khan
dc.format.extent	5 pages
dc.language.iso	en	en_US
dc.publisher	BRAC University	en_US
dc.subject	Optical character reader (OCR)
dc.subject	Bangla language processing
dc.title	Integrating Bangla script recognition support in tesseract OCR	en_US
dc.type	Article	en_US
dc.contributor.department	Center for Research on Bangla Language Processing (CRBLP), BRAC University