dc.contributor.advisor | Uddin, Dr. Jia | |
dc.contributor.author | Hoque, A. K. M Rashedul | |
dc.contributor.author | Pandit, Proma Mrittika | |
dc.contributor.author | Nasreen, Najia | |
dc.contributor.author | Raihan, Hasin | |
dc.date.accessioned | 2018-05-14T06:32:34Z | |
dc.date.available | 2018-05-14T06:32:34Z | |
dc.date.copyright | 2017 | |
dc.date.issued | 2017-12 | |
dc.identifier.other | ID 13301049 | |
dc.identifier.other | ID 13301032 | |
dc.identifier.other | ID 13301050 | |
dc.identifier.other | ID 13301102 | |
dc.identifier.uri | http://hdl.handle.net/10361/10142 | |
dc.description | This thesis is submitted in partial fulfilment of the requirements for the degree of Bachelor of Science in Computer Science and Engineering, 2017. | en_US |
dc.description | Cataloged from PDF version of thesis. | |
dc.description | Includes bibliographical references (pages 32-34). | |
dc.description.abstract | Optical character recognition (OCR) is a technology to extract the text from an image.
The main purpose of an OCR is to make editable text documents from different scanned
documents, image files or books. In this paper, we would discuss the process to develop an OCR
for Bangla language. Bangla script contains different shapes and sizes of text. Therefore,
extraction of Bengali text from images becomes challenging. In this paper, we would discuss the
process of developing an OCR for Bengali language, we focus on the training data preparation
process, Tesseract integration procedure for character recognition and the post-processing
techniques. Before the recognition step, few preprocessing steps are needed like noise removal,
convert to gray scale and binarization for scanned documents. In this paper, we present the basic
steps required for developing a Bangla OCR and a complete workflow for development process
with the probable errors encountered during recognition using several techniques. We used
Tesseract version 3.04 for windows operating system and ‘NIKOSH’ Bangla font in this project.
For clear documents, around 95% word level recognition accuracy has been obtained. | en_US |
dc.description.statementofresponsibility | A. K. M Rashedul Hoque | |
dc.description.statementofresponsibility | Proma Mrittika Pandit | |
dc.description.statementofresponsibility | Najia Nasreen | |
dc.description.statementofresponsibility | Hasin Raihan | |
dc.format.extent | 34 pages | |
dc.language.iso | en | en_US |
dc.rights | BRAC University theses are protected by copyright. They may be viewed from this source for any purpose, but reproduction or distribution in any format is prohibited without written permission. | |
dc.subject | Image processing | en_US |
dc.subject | Text extraction | en_US |
dc.subject | Bangla language | en_US |
dc.title | Bangla text extraction by digital image processing | en_US |
dc.type | Thesis | en_US |
dc.contributor.department | Department of Computer Science and Engineering, BRAC University | |
dc.description.degree | B. Computer Science and Engineering | |