Analysis of N-Gram based text categorization for Bangla in a newspaper
MetadataShow full item record
In this paper, we study the outcome of using ngram based algorithm for Bangla text categorization. To analyze the efficiency of this methodology we used one year Prothom-Alo news corpus. Our results show that n-grams of length 2 or 3 are the most useful for categorization. Using gram lengths more than 3reduces the performance of categorization.