dc.contributor.author | Sarkar, Asif Iqbal | |
dc.contributor.author | Pavel, Dewan Shahriar Hossain | |
dc.contributor.author | Khan, Mumit | |
dc.date.accessioned | 2010-10-28T03:51:23Z | |
dc.date.available | 2010-10-28T03:51:23Z | |
dc.date.copyright | 2007 | |
dc.date.issued | 2007 | |
dc.identifier.uri | http://hdl.handle.net/10361/652 | |
dc.description | Includes bibliographical references (page 4-5). | |
dc.description.abstract | This paper addresses the issue of automatic
Bangla corpus creation, which will significantly help the processes of Lexicon development, Morphological Analysis, Automatic Parts of Speech Detection and
Automatic grammar Extraction and machine
translation. The plan is to collect all free Bangla documents on the World Wide Web and offline documents available and extract all the words in them to make a huge repository of text. This body of text or corpus will be used for several purposes of Bangla language processing after it is converted to Unicode
text. The conversion process is also one of the associated and equally important research and development issue. Among several procedures our research focuses on a combination of font and language detection and Unicode conversion of retrieved Bangla text as a solution for automatic Bangla corpus creation and the methodology has been
described in the paper. | en_US |
dc.description.statementofresponsibility | Asif Iqbal Sarkar | |
dc.description.statementofresponsibility | Dewan Shahriar Hossain Pavel | |
dc.description.statementofresponsibility | Mumit Khan | |
dc.format.extent | 5 pages | |
dc.language.iso | en | en_US |
dc.publisher | BRAC University | en_US |
dc.title | Automatic Bangla corpus creation | en_US |
dc.type | Technical report | en_US |
dc.contributor.department | Center for Research on Bangla Language Processing (CRBLP), BRAC University | |