Towards santali linguistic inclusion: building the first Santali-to-English translation model using mT5 transformer and data augmentation

Billah, Syed Mohammed Mostaque; Subarna, Ateya Ahmed; Sarna, Sudipta Nandi; Wasit, Ahmad Shawkat; Shawkat, Ahmad

dc.contributor.advisor	Sadeque, Farig Yousuf
dc.contributor.author	Billah, Syed Mohammed Mostaque
dc.contributor.author	Subarna, Ateya Ahmed
dc.contributor.author	Sarna, Sudipta Nandi
dc.contributor.author	Wasit, Ahmad Shawkat
dc.contributor.author	Shawkat, Ahmad
dc.date.accessioned	2024-06-26T07:24:09Z
dc.date.available	2024-06-26T07:24:09Z
dc.date.copyright	©2023
dc.date.issued	2023-09
dc.identifier.other	ID 20101057
dc.identifier.other	ID 23341089
dc.identifier.other	ID 20101257
dc.identifier.other	ID 20101398
dc.identifier.other	ID 20101042
dc.identifier.uri	http://hdl.handle.net/10361/23605
dc.description	This thesis is submitted in partial fulfillment of the requirements for the degree of Bachelor of Science in Computer Science and Engineering, 2023.	en_US
dc.description	Cataloged from PDF version of thesis.
dc.description	Includes bibliographical references (pages 38-39).
dc.description.abstract	Around seven million individuals in India, Bangladesh, Bhutan, and Nepal speak Santali, positioning it as nearly the third most commonly used Austroasiatic language. Despite its prominence among the Austroasiatic language family’s Munda subfamily, Santali lacks global recognition. Currently, no translation models exist for the Santali language. This paper aims to remove Santali from the NPL spectrum. We aim to examine the feasibility of building Santali-English translation models based on available Santali corpora. This paper successfully addressed the low-resource problem and, with promising results, examined the possibility of using the Santali language. We think that our study will open the door for further exploration into Santali-English machine translation.	en_US
dc.description.statementofresponsibility	Syed Mohammed Mostaque Billah
dc.description.statementofresponsibility	Ateya Ahmed Subarnav
dc.description.statementofresponsibility	Sudipta Nandi Sarna
dc.description.statementofresponsibility	Ahmad Shawkat Wasit
dc.description.statementofresponsibility	Anika Fariha Chowdhury
dc.format.extent	45 pages
dc.language.iso	en	en_US
dc.publisher	Brac University	en_US
dc.rights	Brac University theses are protected by copyright. They may be viewed from this source for any purpose, but reproduction or distribution in any format is prohibited without written permission.
dc.subject	Parallel corpus	en_US
dc.subject	Machine translation	en_US
dc.subject	Neural Machine Translation	en_US
dc.subject	Low resource language	en_US
dc.subject	Aligner	en_US
dc.subject.lcsh	Computer lingiustics
dc.title	Towards santali linguistic inclusion: building the first Santali-to-English translation model using mT5 transformer and data augmentation	en_US
dc.type	Thesis	en_US
dc.contributor.department	Department of Computer Science and Engineering, Brac University
dc.description.degree	B.Sc in Computer Science

Files in this item

Name:: 20101057_23341089_20101257_201 ...
Size:: 2.022Mb
Format:: PDF

View/Open

This item appears in the following Collection(s)

Thesis & Report, BSc (Computer Science and Engineering) [1402]

Show simple item record