History (Forward N-Gram) or future (Backward N-Gram)? Which model to consider for N-Gram analysis in Bangla?
Abstract
This paper presents a directional advantage of n-gram modeling in terms of backward or forward n-gram modeling in Bangla. The most commonly used n-gram analysis is predominantly a forward n-gram. However in Bangla it appears that a backward n-gram
is repeatedly more successful and yields more
grammatical results than a forward n-gram. This paper hypothesizes that the rationale behind this success is the syntactic ordering of constituents in Bangla. Bangla is a head-final specifier-initial language as opposed to English, which is head-initial
specifier-initial. Hence in Bangla, the head comes after its argument in a phrase. If an n-gram analysis begins with a head and moves backwards it will stretch to its own argument but if you move for-wards
then you'll probably grab the argument of an-other head. As probability of occurrence of heads is higher, probability of depending on a head is also higher and hence a backward n-gram will probably have a
greater chance of yielding grammatical results. We carried out several experiments to compare different directional results in different applications with an advantage
in the backward direction. This will prove a
useful linguistic insight in terms of n-gram based analysis depending upon variations of constituent analysis.