Show simple item record

dc.contributor.advisorRahman, Dr. Md. Khalilur
dc.contributor.authorChowdhury, Muhammed Tawfiq
dc.contributor.authorIslam, Md. Saiful
dc.contributor.authorBipul, Baijed Hossain
dc.date.accessioned2015-09-03T07:14:51Z
dc.date.available2015-09-03T07:14:51Z
dc.date.copyright2015
dc.date.issued8/24/2015
dc.identifier.otherID 11101009
dc.identifier.otherID 11101061
dc.identifier.otherID 11101047
dc.identifier.urihttp://hdl.handle.net/10361/4374
dc.descriptionCataloged from PDF version of thesis report.
dc.descriptionIncludes bibliographical references (page 44).
dc.descriptionThis thesis report is submitted in partial fulfillment of the requirements for the degree of Bachelor of Science in Computer Science and Engineering, 2015.en_US
dc.description.abstractOptical character recognition (OCR) is the process of extracting text from an image. The main purpose of an OCR is to make editable documents from existing paper documents or image files. A number of algorithms are required to develop an OCR. Noise removal, skew identification and correction, segmentation, etc are the different steps of developing an OCR. OCR primary works in two phases; they are character and word detection. In case of more sophisticated approach, an OCR also works on sentence detection to preserve documents' structures. In this paper, we would discuss the process of developing an OCR for Bengali language. Lots of efforts have been put on developing an OCR for Bengali. Though some OCRs have been developed, none of them is completely error free. For our thesis, we trained Tesseract OCR engine to develop an OCR for Bengali language. Tesseract is currently the most accurate OCR engine. This engine was developed at HP labs and currently owned by Google. We used a number of software to prepare our training files. Our OCR's library contains 18110 characters and 2617 words. We used "Solaimanlipi" font in our project. We used 200 input files to test the accuracy of our OCR . We are using the latest 3.03 version of Tesseract for windows operating system. For clean image files, the accuracy of our software was as high as 97.56%. It is important to mention that we measured accuracy as the percentage of correct characters and words.en_US
dc.description.statementofresponsibilityMuhammed Tawfiq Chowdhury
dc.description.statementofresponsibilityMd. Saiful Islam
dc.description.statementofresponsibilityBaijed Hossain Bipul
dc.format.extent52 pages
dc.language.isoenen_US
dc.publisherBRAC Universityen_US
dc.rightsBRAC University thesis are protected by copyright. They may be viewed from this source for any purpose, but reproduction or distribution in any format is prohibited without written permission.
dc.subjectComputer science and engineeringen_US
dc.subjectOptical Character Recognition (OCR)en_US
dc.subjectBengali languageen_US
dc.subjectTesseracten_US
dc.subjectJTessboxEditoren_US
dc.subjectNetbeans IDEen_US
dc.titleImplementation of an Optical Character Recognizer (OCR) for Bengali languageen_US
dc.typeThesisen_US
dc.contributor.departmentDepartment of Computer Science and Engineering, BRAC University
dc.description.degreeB. Computer Science and Engineering


Files in this item

Thumbnail

This item appears in the following Collection(s)

Show simple item record