Show simple item record

dc.contributor.authorManzur, Shahrin
dc.contributor.authorIslam, Shafiqul
dc.contributor.authorFoysal, Abu
dc.contributor.authorChowdhury, Aparajita
dc.date.accessioned2016-01-19T13:20:26Z
dc.date.available2016-01-19T13:20:26Z
dc.date.issued2015-12
dc.identifier.otherID 12101113
dc.identifier.otherID 12101128
dc.identifier.otherID 12101131
dc.identifier.otherID 12301056
dc.identifier.urihttp://hdl.handle.net/10361/4894
dc.descriptionThis thesis report is submitted in partial fulfillment of the requirements for the degree of Bachelor of Science in Computer Science and Engineering, 2015.en_US
dc.description.abstractIn this paper, we illustrate our attempt to create editable documents from images by retrieving the text. The process is widely known as Optical Character Recognition (OCR). We have tried to build an Android application for detecting Bengali characters. Previously, several attempts have been made in developing a Bengali OCR. However, there were a few limitations which drove us to work on this project. In order to recognize more characters and joint letters, we decided to work on reducing the error rate to preserve more texts. To serve our purpose, we found the Tesseract OCR engine and Leptonica Image Processing Library to be the best option. Tesseract is used in order to recognize the characters and Leptonica is used to build an Android application by extracting data from the text. We are using the Tesseract 3.03 version currently available to work on this project. Moreover, we demonstrate how we obtained better results by manipulating Tesseract along with Serak to create box files and trained data. In addition to that, we discuss how we dealt with joint letters, dangerous ambiguity and contrast issues in order to increase efficiency. Furthermore, we explain our analyzed data, our progress and the future scopes of improvement.en_US
dc.language.isoenen_US
dc.publisherBRAC Universityen_US
dc.subjectOptical Character Recognition (OCR)en_US
dc.subjectTesseracten_US
dc.subjectBangla languageen_US
dc.subjectAndroiden_US
dc.subjectLeptonicaen_US
dc.titleBangla character recognition for Android devicesen_US
dc.typeThesisen_US


Files in this item

Thumbnail

This item appears in the following Collection(s)

Show simple item record