BRAC University Institutional Repository

Automatic Bangla corpus creation

Show simple item record Sarkar, Asif Iqbal Pavel, Dewan Shahriar Hossain Khan, Mumit 2010-10-28T03:51:23Z 2010-10-28T03:51:23Z 2007 2007
dc.description Includes bibliographical references (page 4-5).
dc.description.abstract This paper addresses the issue of automatic Bangla corpus creation, which will significantly help the processes of Lexicon development, Morphological Analysis, Automatic Parts of Speech Detection and Automatic grammar Extraction and machine translation. The plan is to collect all free Bangla documents on the World Wide Web and offline documents available and extract all the words in them to make a huge repository of text. This body of text or corpus will be used for several purposes of Bangla language processing after it is converted to Unicode text. The conversion process is also one of the associated and equally important research and development issue. Among several procedures our research focuses on a combination of font and language detection and Unicode conversion of retrieved Bangla text as a solution for automatic Bangla corpus creation and the methodology has been described in the paper. en_US
dc.description.statementofresponsibility Asif Iqbal Sarkar
dc.description.statementofresponsibility Dewan Shahriar Hossain Pavel
dc.description.statementofresponsibility Mumit Khan
dc.format.extent 5 pages
dc.language.iso en en_US
dc.publisher BRAC University en_US
dc.title Automatic Bangla corpus creation en_US
dc.type Technical report en_US
dc.contributor.department Center for Research on Bangla Language Processing (CRBLP), BRAC University

Files in this item

This item appears in the following Collection(s)

Show simple item record

Policy Guidelines

Search BRACU Repository

Advanced Search


My Account