Show simple item record

dc.contributor.advisorSadeque, Farig Yousuf
dc.contributor.authorBillah, Syed Mohammed Mostaque
dc.contributor.authorSubarna, Ateya Ahmed
dc.contributor.authorSarna, Sudipta Nandi
dc.contributor.authorWasit, Ahmad Shawkat
dc.contributor.authorShawkat, Ahmad
dc.date.accessioned2024-06-26T07:24:09Z
dc.date.available2024-06-26T07:24:09Z
dc.date.copyright©2023
dc.date.issued2023-09
dc.identifier.otherID 20101057
dc.identifier.otherID 23341089
dc.identifier.otherID 20101257
dc.identifier.otherID 20101398
dc.identifier.otherID 20101042
dc.identifier.urihttp://hdl.handle.net/10361/23605
dc.descriptionThis thesis is submitted in partial fulfillment of the requirements for the degree of Bachelor of Science in Computer Science and Engineering, 2023.en_US
dc.descriptionCataloged from PDF version of thesis.
dc.descriptionIncludes bibliographical references (pages 38-39).
dc.description.abstractAround seven million individuals in India, Bangladesh, Bhutan, and Nepal speak Santali, positioning it as nearly the third most commonly used Austroasiatic language. Despite its prominence among the Austroasiatic language family’s Munda subfamily, Santali lacks global recognition. Currently, no translation models exist for the Santali language. This paper aims to remove Santali from the NPL spectrum. We aim to examine the feasibility of building Santali-English translation models based on available Santali corpora. This paper successfully addressed the low-resource problem and, with promising results, examined the possibility of using the Santali language. We think that our study will open the door for further exploration into Santali-English machine translation.en_US
dc.description.statementofresponsibilitySyed Mohammed Mostaque Billah
dc.description.statementofresponsibilityAteya Ahmed Subarnav
dc.description.statementofresponsibilitySudipta Nandi Sarna
dc.description.statementofresponsibilityAhmad Shawkat Wasit
dc.description.statementofresponsibilityAnika Fariha Chowdhury
dc.format.extent45 pages
dc.language.isoenen_US
dc.publisherBrac Universityen_US
dc.rightsBrac University theses are protected by copyright. They may be viewed from this source for any purpose, but reproduction or distribution in any format is prohibited without written permission.
dc.subjectParallel corpusen_US
dc.subjectMachine translationen_US
dc.subjectNeural Machine Translationen_US
dc.subjectLow resource languageen_US
dc.subjectAligneren_US
dc.subject.lcshComputer lingiustics
dc.titleTowards santali linguistic inclusion: building the first Santali-to-English translation model using mT5 transformer and data augmentationen_US
dc.typeThesisen_US
dc.contributor.departmentDepartment of Computer Science and Engineering, Brac University
dc.description.degreeB.Sc in Computer Science 


Files in this item

Thumbnail

This item appears in the following Collection(s)

Show simple item record