dc.contributor.advisor | Sadeque, Farig Yousuf | |
dc.contributor.author | Rodoshi, Mashiat Hasin | |
dc.contributor.author | Ahmed, Moin Uddin | |
dc.contributor.author | Ashraf, Md. Sobhan | |
dc.contributor.author | Mim, Md. Galib Hasan | |
dc.contributor.author | Khanam, Ashfia | |
dc.date.accessioned | 2023-12-20T05:06:39Z | |
dc.date.available | 2023-12-20T05:06:39Z | |
dc.date.copyright | 2023 | |
dc.date.issued | 2023-01 | |
dc.identifier.other | ID 19201089 | |
dc.identifier.other | ID 19301095 | |
dc.identifier.other | ID 19301046 | |
dc.identifier.other | ID 19301094 | |
dc.identifier.other | ID 18301231 | |
dc.identifier.uri | http://hdl.handle.net/10361/22012 | |
dc.description | This thesis is submitted in partial fulfillment of the requirements for the degree of Bachelor of Science in Computer Science, 2023. | en_US |
dc.description | Cataloged from PDF version of thesis. | |
dc.description | Includes bibliographical references (pages 37-38). | |
dc.description.abstract | Experiencing an image on-screen is a privilege that we often seem not to care about.
A visually impaired person does not have that luxury. A system that can automatically
produce closed captions of an image can thus help visually impaired people
experience what’s appearing on a digital screen. Research in this area has been
in the forefront of multimodal machine learning for quite some time; but while a
plethora of languages has benefited from all that research, Bangla has been left
behind. For our thesis, we would like to build a Bangla Caption Generator using
multimodal learning with high accuracy which automatically produces closed captioning
in Bangla for digital images. The generator will be able to identify different
objects in the image, relations among the objects and the actions happening in the
image using neural networks. Combining the information collected, it may construct
an information-rich, descriptive caption for the image. These captions can be later
read aloud so that visually impaired people can get an idea about what is happening
around them. This thesis aims to achieve further improvement upon the existing
image caption generator in Bangla so that it can greatly help to improve the lives
of visually impaired people as well as advance this research towards the state of the
art. We have used the Flickr8k and Flickr30k datasets containing 8091 and 31783
images respectively and there are five Bangla captions for each image. We have used
the VGG16, VGG19, ResNet50, InceptionV3 and EfficientNetB3 CNN architectures
for feature extraction. Our best model has achieved a BLEU-1, BLEU-2, BLEU-3
and BLEU-4 score of 0.553197, 0.341976, 0.234436 and 0.113089 respectively. | en_US |
dc.description.statementofresponsibility | Mashiat Hasin Rodoshi | |
dc.description.statementofresponsibility | Moin Uddin Ahmed | |
dc.description.statementofresponsibility | Md. Sobhan Ashraf | |
dc.description.statementofresponsibility | Md. Galib Hasan Mim | |
dc.description.statementofresponsibility | Ashfia Khanam | |
dc.format.extent | 38 pages | |
dc.language.iso | en | en_US |
dc.publisher | Brac University | en_US |
dc.rights | Brac University theses are protected by copyright. They may be viewed from this source for any purpose, but reproduction or distribution in any format is prohibited without written permission. | |
dc.subject | Image captioning | en_US |
dc.subject | CNN | en_US |
dc.subject | LSTM | en_US |
dc.subject | RNN | en_US |
dc.subject | Deep learning | en_US |
dc.subject | Bangla | en_US |
dc.subject | Natural language processing | en_US |
dc.subject.lcsh | Machine learning | |
dc.subject.lcsh | Neural networks (Computer science) | |
dc.subject.lcsh | Natural language processing (Computer science) | |
dc.subject.lcsh | Cognitive learning theory | |
dc.title | Automated image caption generator in Bangla using multimodal learning | en_US |
dc.type | Thesis | en_US |
dc.contributor.department | Department of Computer Science and Engineering, Brac University | |
dc.description.degree | B.Sc. in Computer Science and Engineering | |