• Login
    • Library Home
    View Item 
    •   BracU IR
    • Centre for Research on Bangla Language Processing (CRBLP)
    • Conference Papers (Centre for Research on Bangla Language Processing)
    • View Item
    •   BracU IR
    • Centre for Research on Bangla Language Processing (CRBLP)
    • Conference Papers (Centre for Research on Bangla Language Processing)
    • View Item
    JavaScript is disabled for your browser. Some features of this site may not work without it.

    A proposed automated extraction procedure of Bangla text for corpus creation in unicode

    Thumbnail
    View/Open
    A PROPOSED AUTOMATED EXTRACTION PROCEDURE.pdf (377.0Kb)
    Date
    2006
    Publisher
    BRAC University
    Author
    Pavel, Dewan Shahriar Hossain
    Sarkar, Asif Iqbal
    Khan, Mumit
    Metadata
    Show full item record
    URI
    http://hdl.handle.net/10361/672
    Abstract
    This paper addresses the issue of automated Bangla corpus creation, which will significantly help the processes of lexicon development, morphological analysis, automatic parts of speech detection and automatic grammar extraction and machine translation. The plan is to collect all free Bangla documents on the world wide web and offline documents available and extract all the words in them to make a huge repository of text. This body of text or corpus will be used for several purposes of Bangla language processing after it is converted to Unicode text. The conversion process is also one of the associated and equally important research and development issue. Among several procedures our research focuses on a combination of font and language detection and Unicode conversion of retrieved Bangla text as a solution for automatic Bangla corpus creation and the methodology has been described in the paper.
    Keywords
    Corpus; TTF (true type font); OTF (open type font); Unicode; Converter; Crawler; Search engine; N-gram
     
    Description
    Includes bibliographical references (page 5).
    Department
    Center for Research on Bangla Language Processing (CRBLP), BRAC University
    Collections
    • Conference Papers (Centre for Research on Bangla Language Processing)

    Copyright © 2008-2019 Ayesha Abed Library, Brac University 
    Contact Us | Send Feedback
     

     

    Policy Guidelines

    • BracU Policy
    • Publisher Policy

    Browse

    All of BracU Institutional RepositoryCommunities & CollectionsBy Issue DateAuthorsTitlesSubjectsThis CollectionBy Issue DateAuthorsTitlesSubjects

    My Account

    LoginRegister

    Statistics

    View Usage Statistics

    Copyright © 2008-2019 Ayesha Abed Library, Brac University 
    Contact Us | Send Feedback