Bangla text extraction by digital image processing

Hoque, A. K. M Rashedul; Pandit, Proma Mrittika; Nasreen, Najia; Raihan, Hasin

dc.contributor.advisor	Uddin, Dr. Jia
dc.contributor.author	Hoque, A. K. M Rashedul
dc.contributor.author	Pandit, Proma Mrittika
dc.contributor.author	Nasreen, Najia
dc.contributor.author	Raihan, Hasin
dc.date.accessioned	2018-05-14T06:32:34Z
dc.date.available	2018-05-14T06:32:34Z
dc.date.copyright	2017
dc.date.issued	2017-12
dc.identifier.other	ID 13301049
dc.identifier.other	ID 13301032
dc.identifier.other	ID 13301050
dc.identifier.other	ID 13301102
dc.identifier.uri	http://hdl.handle.net/10361/10142
dc.description	This thesis is submitted in partial fulfilment of the requirements for the degree of Bachelor of Science in Computer Science and Engineering, 2017.	en_US
dc.description	Cataloged from PDF version of thesis.
dc.description	Includes bibliographical references (pages 32-34).
dc.description.abstract	Optical character recognition (OCR) is a technology to extract the text from an image. The main purpose of an OCR is to make editable text documents from different scanned documents, image files or books. In this paper, we would discuss the process to develop an OCR for Bangla language. Bangla script contains different shapes and sizes of text. Therefore, extraction of Bengali text from images becomes challenging. In this paper, we would discuss the process of developing an OCR for Bengali language, we focus on the training data preparation process, Tesseract integration procedure for character recognition and the post-processing techniques. Before the recognition step, few preprocessing steps are needed like noise removal, convert to gray scale and binarization for scanned documents. In this paper, we present the basic steps required for developing a Bangla OCR and a complete workflow for development process with the probable errors encountered during recognition using several techniques. We used Tesseract version 3.04 for windows operating system and ‘NIKOSH’ Bangla font in this project. For clear documents, around 95% word level recognition accuracy has been obtained.	en_US
dc.description.statementofresponsibility	A. K. M Rashedul Hoque
dc.description.statementofresponsibility	Proma Mrittika Pandit
dc.description.statementofresponsibility	Najia Nasreen
dc.description.statementofresponsibility	Hasin Raihan
dc.format.extent	34 pages
dc.language.iso	en	en_US
dc.rights	BRAC University theses are protected by copyright. They may be viewed from this source for any purpose, but reproduction or distribution in any format is prohibited without written permission.
dc.subject	Image processing	en_US
dc.subject	Text extraction	en_US
dc.subject	Bangla language	en_US
dc.title	Bangla text extraction by digital image processing	en_US
dc.type	Thesis	en_US
dc.contributor.department	Department of Computer Science and Engineering, BRAC University
dc.description.degree	B. Computer Science and Engineering

Files in this item

Name:: 13301049,13301032,13301050,133 ...
Size:: 1.432Mb
Format:: PDF

View/Open

This item appears in the following Collection(s)

Thesis & Report, BSc (Computer Science and Engineering) [1589]

Show simple item record