Bangla character recognition for Android devices

Manzur, Shahrin; Islam, Shafiqul; Foysal, Abu; Chowdhury, Aparajita

dc.contributor.author	Manzur, Shahrin
dc.contributor.author	Islam, Shafiqul
dc.contributor.author	Foysal, Abu
dc.contributor.author	Chowdhury, Aparajita
dc.date.accessioned	2016-01-19T13:20:26Z
dc.date.available	2016-01-19T13:20:26Z
dc.date.issued	2015-12
dc.identifier.other	ID 12101113
dc.identifier.other	ID 12101128
dc.identifier.other	ID 12101131
dc.identifier.other	ID 12301056
dc.identifier.uri	http://hdl.handle.net/10361/4894
dc.description	This thesis report is submitted in partial fulfillment of the requirements for the degree of Bachelor of Science in Computer Science and Engineering, 2015.	en_US
dc.description.abstract	In this paper, we illustrate our attempt to create editable documents from images by retrieving the text. The process is widely known as Optical Character Recognition (OCR). We have tried to build an Android application for detecting Bengali characters. Previously, several attempts have been made in developing a Bengali OCR. However, there were a few limitations which drove us to work on this project. In order to recognize more characters and joint letters, we decided to work on reducing the error rate to preserve more texts. To serve our purpose, we found the Tesseract OCR engine and Leptonica Image Processing Library to be the best option. Tesseract is used in order to recognize the characters and Leptonica is used to build an Android application by extracting data from the text. We are using the Tesseract 3.03 version currently available to work on this project. Moreover, we demonstrate how we obtained better results by manipulating Tesseract along with Serak to create box files and trained data. In addition to that, we discuss how we dealt with joint letters, dangerous ambiguity and contrast issues in order to increase efficiency. Furthermore, we explain our analyzed data, our progress and the future scopes of improvement.	en_US
dc.language.iso	en	en_US
dc.publisher	BRAC University	en_US
dc.subject	Optical Character Recognition (OCR)	en_US
dc.subject	Tesseract	en_US
dc.subject	Bangla language	en_US
dc.subject	Android	en_US
dc.subject	Leptonica	en_US
dc.title	Bangla character recognition for Android devices	en_US
dc.type	Thesis	en_US

Files in this item

Name:: 12101113.pdf
Size:: 2.330Mb
Format:: PDF

View/Open

This item appears in the following Collection(s)

Thesis & Report, BSc (Computer Science and Engineering) [1586]

Show simple item record