dc.contributor.advisor | Khan, Mumit | |
dc.contributor.advisor | UzZaman, Naushad | |
dc.contributor.author | Hasan, Fahim Muhammad | |
dc.date.accessioned | 2010-09-16T07:14:09Z | |
dc.date.available | 2010-09-16T07:14:09Z | |
dc.date.copyright | 2006 | |
dc.date.issued | 2006-12 | |
dc.identifier.other | ID 03101057 | |
dc.identifier.uri | http://hdl.handle.net/10361/83 | |
dc.description | This thesis report is submitted in partial fulfillment of the requirements for the degree of Bachelor of Science in Computer Science and Engineering, 2006. | en_US |
dc.description | Cataloged from PDF version of thesis report. | |
dc.description | Includes bibliographical references (page 47). | |
dc.description.abstract | There are different approaches to the problem of assigning a part of
speech (POS) tag to each word of a natural language sentence. We
present a comparison of the different approaches of POS tagging for the
Bangla language and two other South Asian languages, as well as the
baseline performances of different POS tagging techniques for the
English language. The most widely used methods for English are the
statistical methods i.e. n-gram based tagging or Hidden Markov Model
(HMM) based tagging, the rule based or transformation based methods
i.e. Brill’s tagger. Subsequent researches add various modifications to
these basic approaches to improve the performance of the taggers for
English. Here, we present an elaborate review of previous work in the
area with the focus on South Asian Languages such as Hindi and
Bangla. We experiment with Brill’s transformation based tagger and the
supervised HMM based tagger without modifications for added
improvement in accuracy, on English using training corpora of different
sizes from the Brown corpus. We also compare the performances of
these taggers on three South Asian languages with the focus on Bangla
using two different tagsets and corpora of different sizes, which reveals
that Brill's transformation based tagger performs considerably well for
South Asian languages. We also check the baseline performances of the
taggers for English and try to conclude how these approaches might
perform if we use a considerable amount of annotated training corpus. | en_US |
dc.description.statementofresponsibility | Fahim Muhammad Hasan | |
dc.format.extent | 63 pages | |
dc.publisher | BRAC University | en_US |
dc.rights | BRAC University thesis are protected by copyright. They may be viewed from this source for any purpose, but reproduction or distribution in any format is prohibited without written permission. | |
dc.subject | Computer science and engineering | |
dc.title | Comparison of different POS tagging techniques for some South Asian languages | en_US |
dc.type | Thesis | en_US |
dc.contributor.department | Department of Computer Science and Engineering, BRAC University | |
dc.description.degree | B. Computer Science and Engineering | |