Sentiment analysis for Bangla microblog posts

Shaika, Chowdhury; Chowdhury, Wasifa

dc.contributor.author	Shaika, Chowdhury
dc.contributor.author	Chowdhury, Wasifa
dc.date.accessioned	2014-01-29T06:58:08Z
dc.date.available	2014-01-29T06:58:08Z
dc.date.issued	2014-01
dc.identifier.other	ID 10101037
dc.identifier.other	ID 10101038
dc.identifier.uri	http://hdl.handle.net/10361/2902
dc.description	This thesis report is submitted in partial fulfillment of the requirements for the degree of Bachelor of Science in Computer Science and Engineering, 2014.	en_US
dc.description	Cataloged from PDF version of thesis report.
dc.description	Includes bibliographical references (page 47).
dc.description.abstract	Sentiment analysis has received great attention recently due to the huge amount of user-generated information on the microblogging sites, such as Twitter [1], which are utilized for many applications like product review mining and making future predictions of events such as predicting election results. Much of the research work on sentiment analysis has been applied to the English language, but construction of resources and tools for sentiment analysis in languages other than English is a growing need since the microblog posts are not just posted in English, but in other languages as well. Work on Bangla (or Bengali language) is necessary as it is one of the most spoken languages, ranked seventh in the world [13]. In this paper, we aim to automatically extract the sentiments or opinions conveyed by users from Bangla microblog posts and then identify the overall polarity of texts as either negative or positive. We use a semi-supervised bootstrapping approach for the development of the training corpus which avoids the need for labor intensive manual annotation. For classification, we use Support Vector Machines (SVM) and Maximum Entropy (MaxEnt) and do a comparative analysis on the performance of these two machine learning algorithms by experimenting with a combination of various sets of features. We also construct a Twitter-specific Bangla sentiment lexicon, which is utilized for the rule-based classifier and as a binary feature in the classifiers used. For our work, we choose Twitter as the microblogging site as it is one of the most popular microblogging platforms in the world.	en_US
dc.description.statementofresponsibility	Chowdhury, Shaika
dc.description.statementofresponsibility	Chowdhury, Wasifa
dc.format.extent	47 pages
dc.language.iso	en	en_US
dc.publisher	BRAC University	en_US
dc.rights	BRAC University thesis are protected by copyright. They may be viewed from this source for any purpose, but reproduction or distribution in any format is prohibited without written permission.
dc.subject	Computer science and engineering
dc.title	Sentiment analysis for Bangla microblog posts	en_US
dc.type	Thesis	en_US
dc.contributor.department	Department of Computer Science and Engineering, BRAC University
dc.description.degree	B. Computer Science and Engineering

Files in this item

Name:: 10101037 & 10101038.pdf
Size:: 1.918Mb
Format:: PDF

View/Open

This item appears in the following Collection(s)

Thesis & Report, BSc (Computer Science and Engineering) [1480]

Show simple item record