Show simple item record

dc.contributor.authorPavel, Dewan Shahriar Hossain
dc.contributor.authorSarkar, Asif Iqbal
dc.contributor.authorKhan, Mumit
dc.date.accessioned2010-12-08T04:07:37Z
dc.date.available2010-12-08T04:07:37Z
dc.date.copyright2006
dc.date.issued2006
dc.identifier.urihttp://hdl.handle.net/10361/672
dc.descriptionIncludes bibliographical references (page 5).
dc.description.abstractThis paper addresses the issue of automated Bangla corpus creation, which will significantly help the processes of lexicon development, morphological analysis, automatic parts of speech detection and automatic grammar extraction and machine translation. The plan is to collect all free Bangla documents on the world wide web and offline documents available and extract all the words in them to make a huge repository of text. This body of text or corpus will be used for several purposes of Bangla language processing after it is converted to Unicode text. The conversion process is also one of the associated and equally important research and development issue. Among several procedures our research focuses on a combination of font and language detection and Unicode conversion of retrieved Bangla text as a solution for automatic Bangla corpus creation and the methodology has been described in the paper.en_US
dc.description.statementofresponsibilityDewan Shahriar Hossain Pavel
dc.description.statementofresponsibilityAsif Iqbal Sarkar
dc.description.statementofresponsibilityMumit Khan
dc.language.isoenen_US
dc.publisherBRAC Universityen_US
dc.subjectCorpusen_US
dc.subjectTTF (true type font)en_US
dc.subjectOTF (open type font)en_US
dc.subjectUnicodeen_US
dc.subjectConverteren_US
dc.subjectCrawleren_US
dc.subjectSearch engineen_US
dc.subjectN-gramen_US
dc.titleA proposed automated extraction procedure of Bangla text for corpus creation in unicodeen_US
dc.typeArticleen_US
dc.contributor.departmentCenter for Research on Bangla Language Processing (CRBLP), BRAC University


Files in this item

Thumbnail

This item appears in the following Collection(s)

Show simple item record