Analysis of N-Gram based text categorization for Bangla in a newspaper corpus
Abstract
The goal of any classification is to build a set of models that can correctly predict the class of different objects. Text categorization is one such application and can be used in many classification task, e.g. news categorization, language identification, authorship attribution, text genre categorization, recommendation systems etc. In this paper we analyze the performance of n-gram based text categorization for Bangla in a Bangladeshi newspaper, Prothom-Alo corpus.