Recently added

Automatic Bangla corpus creation

Sarkar, Asif Iqbal; Pavel, Dewan Shahriar Hossain; Khan, Mumit (BRAC University, 2007)

This paper addresses the issue of automatic Bangla corpus creation, which will significantly help the processes of Lexicon development, Morphological Analysis, Automatic Parts of Speech Detection and Automatic grammar ...

A comprehensive Bangla spelling checker

Naushad UzZaman; Khan, Mumit (BRAC University, 2006)

We present a comprehensive Bangla spelling checker that improves the quality of suggestions for misspelled words. The complex rules for Bangla spelling presents a significant challenge in producing suggestions for a ...

A survey on script segmentation for Bangla OCR

Abduallah, Arif Billah Al-Mahmud; Khan, Mumit (Center for research on Bangla language processing (CRBLP), BRAC University, 2007)

Script segmentation is an important primary task for any Optical Character Recognition (OCR) software. Especially, in case of off-line OCR for printed character, it has more importance. Through script segmentation a big ...

Text normalization system for Bangla

Alam, Firoj; Habib, S. M. Murtoza; Khan, Mumit (BRAC University, 2008)

This paper describes a process of text normalization system of Bangla language (exonym: Bengali) by identifying the semiotic classes from Bangla text corpus. After identifying the semiotic classes a set of rules were ...

Research report on Bengali NLP engine for TTS

Alam, Firoj (BRAC University, 2008-04-07)

This report describes the Bengali NLP processor for TTS, along with the challenges faced in developing the NLP processor.

Research report on parallel corpus translation challenges and processes

Khan, Mumit (BRAC University, 2007-10-08)

We describe some of the challenges in developing English-Bangla parallel corpora, and look some of the established processes used by other language corpora for solutions to some of these challenges.

Research report on Translations of gTLDs and ccTLDs in Bangla

Alam, Firoj; Habib, Murtoza; Hayder, Kamrul; Khan, Mumit (BRAC University, 2007-10-08)

This report describes the initial translations of gTLDs and ccTLDs in Bengali, along with the challenges faced in creating the translations.

Research report on Bangla wordnet development challenges and solutions

Khan, Mumit (BRAC University, 2007-10-08)

We describe the initial design of Bangla WordNet (BWN), based on the English WordNet 2.2 distribution from Princeton University. Our goal is to develop a 5,000 entry Bangla WordNet over the next two years. At present, we ...

Acoustic analysis of Bangla vowel inventory

Alam, Firoj; Habib, S. M. Murtoza; Khan, Mumit (BRAC University, 2008)

This paper describes the acoustic characteristics of Bangla vowels, obtained by analyzing the recordings of male and female voices. First, the duration of each phoneme was identified by averaging both the male and female ...

Acoutstic analysis of Bangla consonants

Alam, Firoj; Habib, S. M. Murtoza; Khan, Mumit (BRAC University, 2008)

This paper describes the acoustic characteristics of Bangla consonants, obtained by analyzing the recordings of male and female voices. First, the duration of each phoneme was identified by averaging both the male and ...

A high performance domain specific OCR for Bangla script

Hasnat, Md. Abul; Habib, S. M. Murtoza; Khan, Mumit (BRAC University, 2008)

Abstract-Research on recognizing Bengali script has been started since mid 1980’s. A variety of different techniques have been applied and the performance is examined. In this paper we present a high performance domain ...

Syntactic part of speech tagging guidelines for Bangla text

Mahmud, Altaf; Khan, Mumit (BRAC University, 2009)

Recently, several techniques have been tested to automatically assign part-of-speeches to Bangla texts using different tag sets. But there is always a need for a standard tagset for Bangla that has been formally published ...

Example based English-Bengali machine translation using wordnet

Salm, Khan Md. Anwarus Salam; Khan, Mumit; Nishino, Tetsuro (BRAC University, 2009)

In this paper we propose an architecture of English-Bengali Example Based Machine Translation (EBMT) using WordNet. The proposed EBMT system has five steps: 1) Tagging 2) Parsing 3) Prepare the chunks of the sentence using ...

Development of annotated Bangla speech corpora

Alam, Firoj; Habib, S. M. Murtoza; Sultana, Dil Afroza; Khan, Mumit (BRAC University, 2010)

This paper describes the development procedure of three different Bangla read speech corpora which can be used for phonetic research and developing speech applications. Several criteria were maintained in the corpora ...

Integrating Bangla script recognition support in tesseract OCR

Hasnat, Md. Abul; Chowdhury, Muttakinur Rahman; Khan, Mumit (BRAC University, 2009)

Tesseract is considered one of the most accurate free software OCR engines currently available. It was originally developed by Hewlett-Packard from 1985 until 1995, and is currently maintained by Google. At present, Tesseract ...

Rule based automated pronunciation generator

Mosaddeque, Ayesha Binte; UzZaman, Naushad; Khan, Mumit (BRAC University, 2006)

This paper presents a rule based ronunciation generator for Bangla words. It takes a word and finds the pronunciations for the graphemes of the word. A grapheme is a unit in writing that cannot be analyzed into smaller ...

N-gram based statistical grammar checker for Bangla and English

Alam, Md. Jahangir; UzZaman, Naushad; Khan, Mumit (Center for research on Bangla language processing (CRBLP), BRAC University, 2006)

This paper describes a statistical grammar checker, which considers the n-gram based analysis of words and POS tags to decide whether the sentence is grammatically correct or not. We employed this technique for both Bangla ...

Minimally segmenting performance Bangla optical character recognition using Kohonen network

Shatil, Adnan Mohammad Shoeb; Khan, Mumit (BRAC University, 2006)

This paper presents a method to use Kohonen neural network based classifier in Bangla Optical Character Recognition (OCR) system, providing much higher performance than the traditional neural network based ones. It describes ...

JKimmo: A Multilingual computational mophology frame work for PC-KIMMO

Islam, Md. Zahurul; Khan, Mumit (BRAC University, 2006)

Morphological analysis is of fundamental interest in computational linguistics and language processing. While there are established morphological analyzers for mostly Western and a few other languages using localized ...

Infrastructure for Bangla information retrieval in context of ICT for development

Haque, Nafid; Ali, M Hammad; Abduallah, Matin Saad (BRAC University, 2006)

In this paper, we talk about developing a search engine and information retrieval system for Bangla. Current work done in this area assumes the use of a particular type of encoding or the availability of particular facilities ...

Centre for Research on Bangla Language Processing (CRBLP): Recent submissions

Automatic Bangla corpus creation ﻿

A comprehensive Bangla spelling checker ﻿

A survey on script segmentation for Bangla OCR ﻿

Text normalization system for Bangla ﻿

Research report on Bengali NLP engine for TTS ﻿

Research report on parallel corpus translation challenges and processes ﻿

Research report on Translations of gTLDs and ccTLDs in Bangla ﻿

Research report on Bangla wordnet development challenges and solutions ﻿

Acoustic analysis of Bangla vowel inventory ﻿

Acoutstic analysis of Bangla consonants ﻿

A high performance domain specific OCR for Bangla script ﻿

Syntactic part of speech tagging guidelines for Bangla text ﻿

Example based English-Bengali machine translation using wordnet ﻿

Development of annotated Bangla speech corpora ﻿

Integrating Bangla script recognition support in tesseract OCR ﻿

Rule based automated pronunciation generator ﻿

N-gram based statistical grammar checker for Bangla and English ﻿

Minimally segmenting performance Bangla optical character recognition using Kohonen network ﻿

JKimmo: A Multilingual computational mophology frame work for PC-KIMMO ﻿

Infrastructure for Bangla information retrieval in context of ICT for development ﻿

Automatic Bangla corpus creation

A comprehensive Bangla spelling checker

A survey on script segmentation for Bangla OCR

Text normalization system for Bangla

Research report on Bengali NLP engine for TTS

Research report on parallel corpus translation challenges and processes

Research report on Translations of gTLDs and ccTLDs in Bangla

Research report on Bangla wordnet development challenges and solutions

Acoustic analysis of Bangla vowel inventory

Acoutstic analysis of Bangla consonants

A high performance domain specific OCR for Bangla script

Syntactic part of speech tagging guidelines for Bangla text

Example based English-Bengali machine translation using wordnet

Development of annotated Bangla speech corpora

Integrating Bangla script recognition support in tesseract OCR

Rule based automated pronunciation generator

N-gram based statistical grammar checker for Bangla and English

Minimally segmenting performance Bangla optical character recognition using Kohonen network

JKimmo: A Multilingual computational mophology frame work for PC-KIMMO

Infrastructure for Bangla information retrieval in context of ICT for development