A double metaphone encoding for approximate name searching and matching in Bangla
Abstract
Almost any word can be a Bangali name, and the name in turn is often spelled in many different ways, all of which are considered
correct and interchangeable. The reason for the spelling complication is two-fold: (1) there is a large gap between the script and pronunciation in Bangla, largely attributed to the large scale Sanskritization process that started in the 12th century and
continued throughout the middle ages, and (2) typical Bangla names have very different origins, from the indigenous names
derived primarily from Sanskrit, to the imported Muslim names from Persian and Arabic, Christian names from Portuguese, and
even the names from popular Western TV soap-operas. However, there is always a large degree of phonetic similarity in
the spelling variants of a name, which is the key to searching and matching names in records. We present a Double Metaphone encoding for Bangla names, taking into account the various spelling and phonetic rules in use, which can be used by
applications to search for and match names. We encode the spelling variants of a large number of names found in the literature to demonstrate that the encoding does indeed show that the variants of a name are equivalent. A name searching algorithm may employ various figures of merit to narrow the list of possibilities when searching for similar names; we demonstrate one such figure of merit using name encoding and
edit distance that has shown good promise.