Automated image caption generator in Bangla using multimodal learning

Rodoshi, Mashiat Hasin; Ahmed, Moin Uddin; Ashraf, Md. Sobhan; Mim, Md. Galib Hasan; Khanam, Ashfia

dc.contributor.advisor	Sadeque, Farig Yousuf
dc.contributor.author	Rodoshi, Mashiat Hasin
dc.contributor.author	Ahmed, Moin Uddin
dc.contributor.author	Ashraf, Md. Sobhan
dc.contributor.author	Mim, Md. Galib Hasan
dc.contributor.author	Khanam, Ashfia
dc.date.accessioned	2023-12-20T05:06:39Z
dc.date.available	2023-12-20T05:06:39Z
dc.date.copyright	2023
dc.date.issued	2023-01
dc.identifier.other	ID 19201089
dc.identifier.other	ID 19301095
dc.identifier.other	ID 19301046
dc.identifier.other	ID 19301094
dc.identifier.other	ID 18301231
dc.identifier.uri	http://hdl.handle.net/10361/22012
dc.description	This thesis is submitted in partial fulfillment of the requirements for the degree of Bachelor of Science in Computer Science, 2023.	en_US
dc.description	Cataloged from PDF version of thesis.
dc.description	Includes bibliographical references (pages 37-38).
dc.description.abstract	Experiencing an image on-screen is a privilege that we often seem not to care about. A visually impaired person does not have that luxury. A system that can automatically produce closed captions of an image can thus help visually impaired people experience what’s appearing on a digital screen. Research in this area has been in the forefront of multimodal machine learning for quite some time; but while a plethora of languages has benefited from all that research, Bangla has been left behind. For our thesis, we would like to build a Bangla Caption Generator using multimodal learning with high accuracy which automatically produces closed captioning in Bangla for digital images. The generator will be able to identify different objects in the image, relations among the objects and the actions happening in the image using neural networks. Combining the information collected, it may construct an information-rich, descriptive caption for the image. These captions can be later read aloud so that visually impaired people can get an idea about what is happening around them. This thesis aims to achieve further improvement upon the existing image caption generator in Bangla so that it can greatly help to improve the lives of visually impaired people as well as advance this research towards the state of the art. We have used the Flickr8k and Flickr30k datasets containing 8091 and 31783 images respectively and there are five Bangla captions for each image. We have used the VGG16, VGG19, ResNet50, InceptionV3 and EfficientNetB3 CNN architectures for feature extraction. Our best model has achieved a BLEU-1, BLEU-2, BLEU-3 and BLEU-4 score of 0.553197, 0.341976, 0.234436 and 0.113089 respectively.	en_US
dc.description.statementofresponsibility	Mashiat Hasin Rodoshi
dc.description.statementofresponsibility	Moin Uddin Ahmed
dc.description.statementofresponsibility	Md. Sobhan Ashraf
dc.description.statementofresponsibility	Md. Galib Hasan Mim
dc.description.statementofresponsibility	Ashfia Khanam
dc.format.extent	38 pages
dc.language.iso	en	en_US
dc.publisher	Brac University	en_US
dc.rights	Brac University theses are protected by copyright. They may be viewed from this source for any purpose, but reproduction or distribution in any format is prohibited without written permission.
dc.subject	Image captioning	en_US
dc.subject	CNN	en_US
dc.subject	LSTM	en_US
dc.subject	RNN	en_US
dc.subject	Deep learning	en_US
dc.subject	Bangla	en_US
dc.subject	Natural language processing	en_US
dc.subject.lcsh	Machine learning
dc.subject.lcsh	Neural networks (Computer science)
dc.subject.lcsh	Natural language processing (Computer science)
dc.subject.lcsh	Cognitive learning theory
dc.title	Automated image caption generator in Bangla using multimodal learning	en_US
dc.type	Thesis	en_US
dc.contributor.department	Department of Computer Science and Engineering, Brac University
dc.description.degree	B.Sc. in Computer Science and Engineering

Files in this item

Name:: 19201089_19301095_19301046_193 ...
Size:: 2.703Mb
Format:: PDF

View/Open

This item appears in the following Collection(s)

Thesis & Report, BSc (Computer Science and Engineering) [1480]

Show simple item record