dc.contributor.advisor | Sadeque, Farig Yousuf | |
dc.contributor.author | Billah, Syed Mohammed Mostaque | |
dc.contributor.author | Subarna, Ateya Ahmed | |
dc.contributor.author | Sarna, Sudipta Nandi | |
dc.contributor.author | Wasit, Ahmad Shawkat | |
dc.contributor.author | Shawkat, Ahmad | |
dc.date.accessioned | 2024-06-26T07:24:09Z | |
dc.date.available | 2024-06-26T07:24:09Z | |
dc.date.copyright | ©2023 | |
dc.date.issued | 2023-09 | |
dc.identifier.other | ID 20101057 | |
dc.identifier.other | ID 23341089 | |
dc.identifier.other | ID 20101257 | |
dc.identifier.other | ID 20101398 | |
dc.identifier.other | ID 20101042 | |
dc.identifier.uri | http://hdl.handle.net/10361/23605 | |
dc.description | This thesis is submitted in partial fulfillment of the requirements for the degree of Bachelor of Science in Computer Science and Engineering, 2023. | en_US |
dc.description | Cataloged from PDF version of thesis. | |
dc.description | Includes bibliographical references (pages 38-39). | |
dc.description.abstract | Around seven million individuals in India, Bangladesh, Bhutan, and Nepal speak
Santali, positioning it as nearly the third most commonly used Austroasiatic language.
Despite its prominence among the Austroasiatic language family’s Munda
subfamily, Santali lacks global recognition. Currently, no translation models exist
for the Santali language. This paper aims to remove Santali from the NPL spectrum.
We aim to examine the feasibility of building Santali-English translation
models based on available Santali corpora. This paper successfully addressed the
low-resource problem and, with promising results, examined the possibility of using
the Santali language. We think that our study will open the door for further exploration
into Santali-English machine translation. | en_US |
dc.description.statementofresponsibility | Syed Mohammed Mostaque Billah | |
dc.description.statementofresponsibility | Ateya Ahmed Subarnav | |
dc.description.statementofresponsibility | Sudipta Nandi Sarna | |
dc.description.statementofresponsibility | Ahmad Shawkat Wasit | |
dc.description.statementofresponsibility | Anika Fariha Chowdhury | |
dc.format.extent | 45 pages | |
dc.language.iso | en | en_US |
dc.publisher | Brac University | en_US |
dc.rights | Brac University theses are protected by copyright. They may be viewed from this source for any purpose, but reproduction or distribution in any format is prohibited without written permission. | |
dc.subject | Parallel corpus | en_US |
dc.subject | Machine translation | en_US |
dc.subject | Neural Machine Translation | en_US |
dc.subject | Low resource language | en_US |
dc.subject | Aligner | en_US |
dc.subject.lcsh | Computer lingiustics | |
dc.title | Towards santali linguistic inclusion: building the first Santali-to-English translation model using mT5 transformer and data augmentation | en_US |
dc.type | Thesis | en_US |
dc.contributor.department | Department of Computer Science and Engineering, Brac University | |
dc.description.degree | B.Sc in Computer Science
| |