Show simple item record

dc.contributor.advisorUddin, Dr. Jia
dc.contributor.authorHoque, A. K. M Rashedul
dc.contributor.authorPandit, Proma Mrittika
dc.contributor.authorNasreen, Najia
dc.contributor.authorRaihan, Hasin
dc.date.accessioned2018-05-14T06:32:34Z
dc.date.available2018-05-14T06:32:34Z
dc.date.copyright2017
dc.date.issued2017-12
dc.identifier.otherID 13301049
dc.identifier.otherID 13301032
dc.identifier.otherID 13301050
dc.identifier.otherID 13301102
dc.identifier.urihttp://hdl.handle.net/10361/10142
dc.descriptionThis thesis is submitted in partial fulfilment of the requirements for the degree of Bachelor of Science in Computer Science and Engineering, 2017.en_US
dc.descriptionCataloged from PDF version of thesis.
dc.descriptionIncludes bibliographical references (pages 32-34).
dc.description.abstractOptical character recognition (OCR) is a technology to extract the text from an image. The main purpose of an OCR is to make editable text documents from different scanned documents, image files or books. In this paper, we would discuss the process to develop an OCR for Bangla language. Bangla script contains different shapes and sizes of text. Therefore, extraction of Bengali text from images becomes challenging. In this paper, we would discuss the process of developing an OCR for Bengali language, we focus on the training data preparation process, Tesseract integration procedure for character recognition and the post-processing techniques. Before the recognition step, few preprocessing steps are needed like noise removal, convert to gray scale and binarization for scanned documents. In this paper, we present the basic steps required for developing a Bangla OCR and a complete workflow for development process with the probable errors encountered during recognition using several techniques. We used Tesseract version 3.04 for windows operating system and ‘NIKOSH’ Bangla font in this project. For clear documents, around 95% word level recognition accuracy has been obtained.en_US
dc.description.statementofresponsibilityA. K. M Rashedul Hoque
dc.description.statementofresponsibilityProma Mrittika Pandit
dc.description.statementofresponsibilityNajia Nasreen
dc.description.statementofresponsibilityHasin Raihan
dc.format.extent34 pages
dc.language.isoenen_US
dc.rightsBRAC University theses are protected by copyright. They may be viewed from this source for any purpose, but reproduction or distribution in any format is prohibited without written permission.
dc.subjectImage processingen_US
dc.subjectText extractionen_US
dc.subjectBangla languageen_US
dc.titleBangla text extraction by digital image processingen_US
dc.typeThesisen_US
dc.contributor.departmentDepartment of Computer Science and Engineering, BRAC University
dc.description.degreeB. Computer Science and Engineering


Files in this item

Thumbnail

This item appears in the following Collection(s)

Show simple item record