Implementation of an Optical Character Reader (OCR) for Bengali language

Chowdhury, Muhammed Tawfiq; Islam, Md. Saiful; Bipul, Baijed Hossain; Rhaman, Md. Khalilur

dc.contributor.author	Chowdhury, Muhammed Tawfiq
dc.contributor.author	Islam, Md. Saiful
dc.contributor.author	Bipul, Baijed Hossain
dc.contributor.author	Rhaman, Md. Khalilur
dc.date.accessioned	2017-01-02T10:04:50Z
dc.date.available	2017-01-02T10:04:50Z
dc.date.issued	2016-03
dc.identifier.citation	Chowdhury, M. T., Islam, M. S., Bipul, B. H., & Rhaman, M. K. (2015). Implementation of an optical character reader (OCR) for bengali language. Paper presented at the Proceedings of 2015 International Conference on Data and Software Engineering, ICODSE 2015, 126-131. doi:10.1109/ICODSE.2015.7436984	en_US
dc.identifier.isbn	978-146738428-5
dc.identifier.uri	http://hdl.handle.net/10361/7454
dc.description	This conference paper was presented in the International Conference on Data and Software Engineering, ICODSE 2015; Universitas Gadjah MadaYogyakarta; Indonesia; 25 November 2015 through 26 November 2015 [© 2015 IEEE] The conference paper's definite version is available at: http://dx.doi.org/10.1109/ICODSE.2015.7436984	en_US
dc.description.abstract	Optical Character Recognition (OCR) is the process of extracting text from an image. The main purpose of an OCR is to make editable documents from existing paper documents or image files. Significant number of algorithms is required to develop an OCR and basically it works in two phases such as character and word detection. In case of a more sophisticated approach, an OCR also works on sentence detection to preserve a document's structure. It has been found that researchers put lots of efforts for developing a Bengali OCR but none of them is completely error free. To take this issue in consideration, the latest 3.03 version of Tesseract OCR engine for Windows operating system is used to develop an OCR for Bengali language. Moreover, 18110 characters and 2617 words are used to make the OCR's library. In this research, 'Solaimanlipi' font and 200 input files are used to test the accuracy of OCR. It is found that for clean image files, the accuracy of the software is as high as 97.56%. It is to be noted that accuracy is measured as the percentage of correct characters and words.	en_US
dc.language.iso	en	en_US
dc.publisher	© 2016 Institute of Electrical and Electronics Engineers Inc.	en_US
dc.relation.uri	http://ieeexplore.ieee.org/document/7436984/
dc.subject	Bengali	en_US
dc.subject	OCR	en_US
dc.subject	Tesseract	en_US
dc.title	Implementation of an Optical Character Reader (OCR) for Bengali language	en_US
dc.type	Conference paper	en_US
dc.description.version	Published
dc.contributor.department	Department of Computer Science and Engineering, BRAC University
dc.identifier.doi	http://dx.doi.org/10.1109/ICODSE.2015.7436984

Files in this item

Files	Size	Format	View
There are no files associated with this item.

This item appears in the following Collection(s)

Show simple item record