Analysis of N-Gram based text categorization for Bangla in a newspaper  corpus

Mansur, Munirul

View/Open

Analysis of N Gram based text categorization.pdf (521.2Kb)

Date

2006-08

Publisher

BRAC University

Abstract

The goal of any classification is to build a set of models that can correctly predict the class of different objects. Text categorization is one such application and can be used in many classification task, e.g. news categorization, language identification, authorship attribution, text genre categorization, recommendation systems etc. In this paper we analyze the performance of n-gram based text categorization for Bangla in a Bangladeshi newspaper, Prothom-Alo corpus.

Keywords

Computer science and engineering

Description

This thesis report is submitted in partial fulfillment of the requirements for the degree of Bachelor of Science in Computer Science and Engineering, 2006.

Cataloged from PDF version of thesis report.

Includes bibliographical references (page 30).

Department

Department of Computer Science and Engineering, BRAC University

Type

Thesis

Collections

Thesis & Report, BSc (Computer Science and Engineering) [1586]