BRAC University Institutional Repository

A double metaphone encoding for approximate name searching and matching in Bangla

DSpace/Manakin Repository

Show simple item record

dc.contributor.author Naushad UzZaman,
dc.contributor.author Khan, Mumit
dc.date.accessioned 2010-10-04T05:22:33Z
dc.date.available 2010-10-04T05:22:33Z
dc.date.copyright 2005
dc.date.issued 2005
dc.identifier.uri http://hdl.handle.net/10361/312
dc.description Includes bibliographical references (page 6).
dc.description.abstract Almost any word can be a Bangali name, and the name in turn is often spelled in many different ways, all of which are considered correct and interchangeable. The reason for the spelling complication is two-fold: (1) there is a large gap between the script and pronunciation in Bangla, largely attributed to the large scale Sanskritization process that started in the 12th century and continued throughout the middle ages, and (2) typical Bangla names have very different origins, from the indigenous names derived primarily from Sanskrit, to the imported Muslim names from Persian and Arabic, Christian names from Portuguese, and even the names from popular Western TV soap-operas. However, there is always a large degree of phonetic similarity in the spelling variants of a name, which is the key to searching and matching names in records. We present a Double Metaphone encoding for Bangla names, taking into account the various spelling and phonetic rules in use, which can be used by applications to search for and match names. We encode the spelling variants of a large number of names found in the literature to demonstrate that the encoding does indeed show that the variants of a name are equivalent. A name searching algorithm may employ various figures of merit to narrow the list of possibilities when searching for similar names; we demonstrate one such figure of merit using name encoding and edit distance that has shown good promise. en_US
dc.format.extent 6 pages
dc.language.iso en en_US
dc.publisher BRAC University en_US
dc.subject Name searching en_US
dc.subject Name encoding, en_US
dc.subject Phonetic encoding en_US
dc.subject Double metaphone encoding en_US
dc.subject Bangla language en_US
dc.title A double metaphone encoding for approximate name searching and matching in Bangla en_US
dc.type Article en_US
dc.contributor.department Center for Research on Bangla Language Processing (CRBLP), BRAC University


Files in this item

Files Size Format View
A double metaphone encoding for approximate.pdf 393.1Kb PDF View/Open or Preview

This item appears in the following Collection(s)

Show simple item record

Policy Guidelines

Search DSpace


Browse

My Account