Comparison of different POS tagging techniques for some South Asian languages

Hasan, Fahim Muhammad

dc.contributor.advisor	Khan, Mumit
dc.contributor.advisor	UzZaman, Naushad
dc.contributor.author	Hasan, Fahim Muhammad
dc.date.accessioned	2010-09-16T07:14:09Z
dc.date.available	2010-09-16T07:14:09Z
dc.date.copyright	2006
dc.date.issued	2006-12
dc.identifier.other	ID 03101057
dc.identifier.uri	http://hdl.handle.net/10361/83
dc.description	This thesis report is submitted in partial fulfillment of the requirements for the degree of Bachelor of Science in Computer Science and Engineering, 2006.	en_US
dc.description	Cataloged from PDF version of thesis report.
dc.description	Includes bibliographical references (page 47).
dc.description.abstract	There are different approaches to the problem of assigning a part of speech (POS) tag to each word of a natural language sentence. We present a comparison of the different approaches of POS tagging for the Bangla language and two other South Asian languages, as well as the baseline performances of different POS tagging techniques for the English language. The most widely used methods for English are the statistical methods i.e. n-gram based tagging or Hidden Markov Model (HMM) based tagging, the rule based or transformation based methods i.e. Brill’s tagger. Subsequent researches add various modifications to these basic approaches to improve the performance of the taggers for English. Here, we present an elaborate review of previous work in the area with the focus on South Asian Languages such as Hindi and Bangla. We experiment with Brill’s transformation based tagger and the supervised HMM based tagger without modifications for added improvement in accuracy, on English using training corpora of different sizes from the Brown corpus. We also compare the performances of these taggers on three South Asian languages with the focus on Bangla using two different tagsets and corpora of different sizes, which reveals that Brill's transformation based tagger performs considerably well for South Asian languages. We also check the baseline performances of the taggers for English and try to conclude how these approaches might perform if we use a considerable amount of annotated training corpus.	en_US
dc.description.statementofresponsibility	Fahim Muhammad Hasan
dc.format.extent	63 pages
dc.publisher	BRAC University	en_US
dc.rights	BRAC University thesis are protected by copyright. They may be viewed from this source for any purpose, but reproduction or distribution in any format is prohibited without written permission.
dc.subject	Computer science and engineering
dc.title	Comparison of different POS tagging techniques for some South Asian languages	en_US
dc.type	Thesis	en_US
dc.contributor.department	Department of Computer Science and Engineering, BRAC University
dc.description.degree	B. Computer Science and Engineering

Files in this item

Name:: Comparison of diffferent pos ...
Size:: 489.9Kb
Format:: PDF

View/Open

This item appears in the following Collection(s)

Thesis & Report, BSc (Computer Science and Engineering) [1589]

Show simple item record