Syntactic part of speech tagging guidelines for Bangla text
Abstract
Recently, several techniques have been tested to automatically assign part-of-speeches to Bangla texts using different tag sets. But there is always a need for a standard tagset for Bangla that has been formally published for syntactical bracketing, along with a details POS tagging guideline for the annotators which shows how a word should be tagged in a particular context. This document presents a guideline for annotating Bangla text by part-of-speech to assist the syntactical bracketing task. This tagset consists of total 55 tags tried to precisely distribute all of the required syntactic categories and encode necessary syntactic information to facilitate advanced linguistic analysis of a morphologically rich and flexible word ordered language. After trained a simple Brill tagger on a manually tagged corpus consists of around 25,000 words, overall accuracy has been achieved 70.6% which is comparable to minimal standard set by different experimental results using any simple supervised learning method on Bangla text.