dc.contributor.advisor | Ashraf, Faisal Bin | |
dc.contributor.author | Khan, Towhid | |
dc.contributor.author | Mallick, David Dew | |
dc.contributor.author | Khan, Md.Shakiful Islam | |
dc.contributor.author | Hasan, Md Mahadi | |
dc.date.accessioned | 2023-10-17T08:43:07Z | |
dc.date.available | 2023-10-17T08:43:07Z | |
dc.date.copyright | ©2022 | |
dc.date.issued | 2022-09-28 | |
dc.identifier.other | ID 18201035 | |
dc.identifier.other | ID 18201045 | |
dc.identifier.other | ID 18201198 | |
dc.identifier.other | ID 18201062 | |
dc.identifier.uri | http://hdl.handle.net/10361/21865 | |
dc.description | This thesis is submitted in partial fulfillment of the requirements for the degree of Bachelor of Science in Computer Science, 2022. | en_US |
dc.description | Cataloged from PDF version of thesis. | |
dc.description | Includes bibliographical references (pages 43-44). | |
dc.description.abstract | The procedure of eradicating extraneous textual elements and preparing or process-
ing the values to be fed into the classifier model is often indicates the concept of
text-preprocessing. There are several preprocessing methods, however not all of
them are effective when used with cross-language and multilingual datasets. Run-
ning a cross-lingual or multilingual dataset through a single pre-processing method
and text classification model is rather challenging. What if a technique could be
used to better classify data from multilingual and cross lingual datasets? In order
to accelerate the process of improving accuracy, we tested various combinations of
data pre-processing with text classification models on datasets in Bangla, English,
and cross-lingual (Native language written in English letters). We may infer from
our experiment that mLSTM functioned effectively for datasets in Bangla and English. Thus, mLSTM can be a helpful preprocessing method for datasets containing
a variety of languages. | en_US |
dc.description.statementofresponsibility | Towhid Khan | |
dc.description.statementofresponsibility | David Dew Mallick | |
dc.description.statementofresponsibility | Md.Shakiful Islam Khan | |
dc.description.statementofresponsibility | Md Mahadi Hasan | |
dc.format.extent | 54 pages | |
dc.language.iso | en | en_US |
dc.publisher | Brac University | en_US |
dc.rights | Brac University theses are protected by copyright. They may be viewed from this source for any purpose, but reproduction or distribution in any format is prohibited without written permission. | |
dc.subject | Random forest | en_US |
dc.subject | Logistic regression | en_US |
dc.subject | TF-IDF | en_US |
dc.subject | SVM | en_US |
dc.subject | XGB | en_US |
dc.subject | mLSTM | en_US |
dc.subject | LSTM | en_US |
dc.subject | Information retrieval | en_US |
dc.subject | Sentiment analysis | en_US |
dc.subject | NLP | en_US |
dc.subject.lcsh | Natural language processing (Computer science) | |
dc.subject.lcsh | Computational linguistics--Congresses | |
dc.title | Text classification with an efficient preprocessing technique for cross-language and multilingual data | en_US |
dc.type | Thesis | en_US |
dc.contributor.department | Department of Computer Science and Engineering, Brac University | |
dc.description.degree | B.Sc. in Computer Science | |