Text classification using machine learning algorithms

Hasnat, Fahim; Hasan, Md. Mazidul; Khan, Nayeem Hasan; Ali, Asif

dc.contributor.advisor	Chakrabarty, Amitabha
dc.contributor.author	Hasnat, Fahim
dc.contributor.author	Hasan, Md. Mazidul
dc.contributor.author	Khan, Nayeem Hasan
dc.contributor.author	Ali, Asif
dc.date.accessioned	2018-12-18T10:46:31Z
dc.date.available	2018-12-18T10:46:31Z
dc.date.copyright	2018
dc.date.issued	8/2/2018
dc.identifier.other	ID 14101043
dc.identifier.other	ID 14301104
dc.identifier.other	ID 14301113
dc.identifier.other	ID 12201068
dc.identifier.uri	http://hdl.handle.net/10361/11026
dc.description	Cataloged from PDF version of thesis.
dc.description	Includes bibliographical references (pages 43-46).
dc.description	This thesis is submitted in partial fulfilment of the requirements for the degree of Bachelor of Science in Computer Science and Engineering, 2018.	en_US
dc.description.abstract	Financial, educational and communal activities produce enormous amount of data. Automatic text classification has significant application in content organization, point of view extraction, evaluation analysis, spam filtering and sentiment analysis. Automatic classification of text documents requires information extraction, machine learning and Natural Language processing. We have proposed a probabilistic framework for text classification. Proposed classification model is composed of three major modules i.e. pre-processing of unstructured text, learning of probabilistic model and the classification of unseen data by using learned model. This framework is trained and tested by using “20 newsgroup” dataset containing twenty different news categories i.e. politics, sports, religions and pc hardware. We have used both supervised and unsupervised algorithms to get the full insight on the relationships among various text classification techniques. Highest accuracy of 84.51% was achieved for 4 categories by Naïve Bayes among the other Supervised Algorithms we used and 62.79% homogeneity was achieved for unsupervised algorithms for 4 categories which demonstrates the effectiveness score of proposed automatic text classification approach.	en_US
dc.description.statementofresponsibility	Fahim Hasnat
dc.description.statementofresponsibility	Md. Mazidul Hasan
dc.description.statementofresponsibility	Nayeem Hasan Khan
dc.description.statementofresponsibility	Asif Ali
dc.format.extent	46 pages
dc.language.iso	en	en_US
dc.publisher	BRAC University	en_US
dc.rights	BRAC University theses are protected by copyright. They may be viewed from this source for any purpose, but reproduction or distribution in any format is prohibited without written permission.
dc.subject	Text classification	en_US
dc.subject	Machine learning	en_US
dc.subject	Pre-processing	en_US
dc.subject	Feature extraction	en_US
dc.subject	Naïve bayes	en_US
dc.subject	Decision tree	en_US
dc.subject.lcsh	Machine learning.
dc.subject.lcsh	Text processing (Computer science)
dc.title	Text classification using machine learning algorithms	en_US
dc.type	Thesis
dc.contributor.department	Department of Computer Science and Engineering, BRAC University
dc.description.degree	B. Computer Science and Engineering

Files in this item

Name:: 14101043,14301104,14301113,122 ...
Size:: 866.8Kb
Format:: PDF

View/Open

This item appears in the following Collection(s)

Thesis & Report, BSc (Computer Science and Engineering) [1480]

Show simple item record