• Login
    • Library Home
    View Item 
    •   BracU IR
    • BracU Faculty Publications
    • Md. Khalilur Rhaman
    • Conference Paper
    • View Item
    •   BracU IR
    • BracU Faculty Publications
    • Md. Khalilur Rhaman
    • Conference Paper
    • View Item
    JavaScript is disabled for your browser. Some features of this site may not work without it.

    Implementation of an Optical Character Reader (OCR) for Bengali language

    Thumbnail
    Date
    2016-03
    Publisher
    © 2016 Institute of Electrical and Electronics Engineers Inc.
    Author
    Chowdhury, Muhammed Tawfiq
    Islam, Md. Saiful
    Bipul, Baijed Hossain
    Rhaman, Md. Khalilur
    Metadata
    Show full item record
    URI
    http://hdl.handle.net/10361/7454
    Citation
    Chowdhury, M. T., Islam, M. S., Bipul, B. H., & Rhaman, M. K. (2015). Implementation of an optical character reader (OCR) for bengali language. Paper presented at the Proceedings of 2015 International Conference on Data and Software Engineering, ICODSE 2015, 126-131. doi:10.1109/ICODSE.2015.7436984
    Abstract
    Optical Character Recognition (OCR) is the process of extracting text from an image. The main purpose of an OCR is to make editable documents from existing paper documents or image files. Significant number of algorithms is required to develop an OCR and basically it works in two phases such as character and word detection. In case of a more sophisticated approach, an OCR also works on sentence detection to preserve a document's structure. It has been found that researchers put lots of efforts for developing a Bengali OCR but none of them is completely error free. To take this issue in consideration, the latest 3.03 version of Tesseract OCR engine for Windows operating system is used to develop an OCR for Bengali language. Moreover, 18110 characters and 2617 words are used to make the OCR's library. In this research, 'Solaimanlipi' font and 200 input files are used to test the accuracy of OCR. It is found that for clean image files, the accuracy of the software is as high as 97.56%. It is to be noted that accuracy is measured as the percentage of correct characters and words.
    Keywords
    Bengali; OCR; Tesseract
     
    Description
    This conference paper was presented in the International Conference on Data and Software Engineering, ICODSE 2015; Universitas Gadjah MadaYogyakarta; Indonesia; 25 November 2015 through 26 November 2015 [© 2015 IEEE] The conference paper's definite version is available at: http://dx.doi.org/10.1109/ICODSE.2015.7436984
    Publisher Link
    http://ieeexplore.ieee.org/document/7436984/
    DOI
    http://dx.doi.org/10.1109/ICODSE.2015.7436984
    Department
    Department of Computer Science and Engineering, BRAC University
    Collections
    • Conference Paper
    • Conference Paper
    • Faculty Publications

    Copyright © 2008-2019 Ayesha Abed Library, Brac University 
    Contact Us | Send Feedback
     

     

    Policy Guidelines

    • BracU Policy
    • Publisher Policy

    Browse

    All of BracU Institutional RepositoryCommunities & CollectionsBy Issue DateAuthorsTitlesSubjectsThis CollectionBy Issue DateAuthorsTitlesSubjects

    My Account

    LoginRegister

    Statistics

    View Usage Statistics

    Copyright © 2008-2019 Ayesha Abed Library, Brac University 
    Contact Us | Send Feedback