Text classification using machine learning algorithms

Hasnat, Fahim; Hasan, Md. Mazidul; Khan, Nayeem Hasan; Ali, Asif

View/Open

14101043,14301104,14301113,12201068_CSE.pdf (866.8Kb)

Date

8/2/2018

Publisher

BRAC University

Abstract

Financial, educational and communal activities produce enormous amount of data. Automatic text classification has significant application in content organization, point of view extraction, evaluation analysis, spam filtering and sentiment analysis. Automatic classification of text documents requires information extraction, machine learning and Natural Language processing. We have proposed a probabilistic framework for text classification. Proposed classification model is composed of three major modules i.e. pre-processing of unstructured text, learning of probabilistic model and the classification of unseen data by using learned model. This framework is trained and tested by using “20 newsgroup” dataset containing twenty different news categories i.e. politics, sports, religions and pc hardware. We have used both supervised and unsupervised algorithms to get the full insight on the relationships among various text classification techniques. Highest accuracy of 84.51% was achieved for 4 categories by Naïve Bayes among the other Supervised Algorithms we used and 62.79% homogeneity was achieved for unsupervised algorithms for 4 categories which demonstrates the effectiveness score of proposed automatic text classification approach.

Keywords

Text classification; Machine learning; Pre-processing; Feature extraction; Naïve bayes; Decision tree

LC Subject Headings

Machine learning.; Text processing (Computer science)

Description

Cataloged from PDF version of thesis.

Includes bibliographical references (pages 43-46).

This thesis is submitted in partial fulfilment of the requirements for the degree of Bachelor of Science in Computer Science and Engineering, 2018.

Department

Department of Computer Science and Engineering, BRAC University

Type

Thesis

Collections

Thesis & Report, BSc (Computer Science and Engineering) [1390]