BRAC University Institutional Repository

A proposed automated extraction procedure of Bangla text for corpus creation in unicode

DSpace/Manakin Repository

Show simple item record

dc.contributor.author Pavel, Dewan Shahriar Hossain
dc.contributor.author Sarkar, Asif Iqbal
dc.contributor.author Khan, Dr. Mumit
dc.date.accessioned 2010-12-08T04:07:37Z
dc.date.available 2010-12-08T04:07:37Z
dc.date.issued 2006
dc.identifier.uri http://hdl.handle.net/10361/672
dc.description.abstract This paper addresses the issue of automated Bangla corpus creation, which will significantly help the processes of lexicon development, morphological analysis, automatic parts of speech detection and automatic grammar extraction and machine translation. The plan is to collect all free Bangla documents on the world wide web and offline documents available and extract all the words in them to make a huge repository of text. This body of text or corpus will be used for several purposes of Bangla language processing after it is converted to Unicode text. The conversion process is also one of the associated and equally important research and development issue. Among several procedures our research focuses on a combination of font and language detection and Unicode conversion of retrieved Bangla text as a solution for automatic Bangla corpus creation and the methodology has been described in the paper. en_US
dc.language.iso en en_US
dc.publisher Center for research on Bangla language processing (CRBLP), BRAC University en_US
dc.subject Corpus en_US
dc.subject TTF (true type font) en_US
dc.subject OTF (open type font) en_US
dc.subject Unicode en_US
dc.subject Converter en_US
dc.subject Crawler en_US
dc.subject Search engine en_US
dc.subject N-gram en_US
dc.title A proposed automated extraction procedure of Bangla text for corpus creation in unicode en_US
dc.type Other en_US


Files in this item

Files Size Format View
A PROPOSED AUTOMATED EXTRACTION PROCEDURE.pdf 386.1Kb PDF View/Open or Preview

This item appears in the following Collection(s)

Show simple item record

Policy Guidelines

Search DSpace


Browse

My Account