Show simple item record

dc.contributor.advisorAshraf, Faisal Bin
dc.contributor.authorKhan, Towhid
dc.contributor.authorMallick, David Dew
dc.contributor.authorKhan, Md.Shakiful Islam
dc.contributor.authorHasan, Md Mahadi
dc.date.accessioned2023-10-17T08:43:07Z
dc.date.available2023-10-17T08:43:07Z
dc.date.copyright©2022
dc.date.issued2022-09-28
dc.identifier.otherID 18201035
dc.identifier.otherID 18201045
dc.identifier.otherID 18201198
dc.identifier.otherID 18201062
dc.identifier.urihttp://hdl.handle.net/10361/21865
dc.descriptionThis thesis is submitted in partial fulfillment of the requirements for the degree of Bachelor of Science in Computer Science, 2022.en_US
dc.descriptionCataloged from PDF version of thesis.
dc.descriptionIncludes bibliographical references (pages 43-44).
dc.description.abstractThe procedure of eradicating extraneous textual elements and preparing or process- ing the values to be fed into the classifier model is often indicates the concept of text-preprocessing. There are several preprocessing methods, however not all of them are effective when used with cross-language and multilingual datasets. Run- ning a cross-lingual or multilingual dataset through a single pre-processing method and text classification model is rather challenging. What if a technique could be used to better classify data from multilingual and cross lingual datasets? In order to accelerate the process of improving accuracy, we tested various combinations of data pre-processing with text classification models on datasets in Bangla, English, and cross-lingual (Native language written in English letters). We may infer from our experiment that mLSTM functioned effectively for datasets in Bangla and English. Thus, mLSTM can be a helpful preprocessing method for datasets containing a variety of languages.en_US
dc.description.statementofresponsibilityTowhid Khan
dc.description.statementofresponsibilityDavid Dew Mallick
dc.description.statementofresponsibilityMd.Shakiful Islam Khan
dc.description.statementofresponsibilityMd Mahadi Hasan
dc.format.extent54 pages
dc.language.isoenen_US
dc.publisherBrac Universityen_US
dc.rightsBrac University theses are protected by copyright. They may be viewed from this source for any purpose, but reproduction or distribution in any format is prohibited without written permission.
dc.subjectRandom foresten_US
dc.subjectLogistic regressionen_US
dc.subjectTF-IDFen_US
dc.subjectSVMen_US
dc.subjectXGBen_US
dc.subjectmLSTMen_US
dc.subjectLSTMen_US
dc.subjectInformation retrievalen_US
dc.subjectSentiment analysisen_US
dc.subjectNLPen_US
dc.subject.lcshNatural language processing (Computer science)
dc.subject.lcshComputational linguistics--Congresses
dc.titleText classification with an efficient preprocessing technique for cross-language and multilingual dataen_US
dc.typeThesisen_US
dc.contributor.departmentDepartment of Computer Science and Engineering, Brac University
dc.description.degreeB.Sc. in Computer Science


Files in this item

Thumbnail

This item appears in the following Collection(s)

Show simple item record