Automatic Bengali image captioning using efficientNet-transformer network and vision transformer

Kabir, Muhammad Khubayeeb; Labonno, Anindita; Amin, Sofia; Tahsin, Fariha

dc.contributor.advisor	Rahman, Md. Khalilur
dc.contributor.advisor	Mostakim, Moin
dc.contributor.author	Kabir, Muhammad Khubayeeb
dc.contributor.author	Labonno, Anindita
dc.contributor.author	Amin, Sofia
dc.contributor.author	Tahsin, Fariha
dc.date.accessioned	2023-08-08T05:48:24Z
dc.date.available	2023-08-08T05:48:24Z
dc.date.copyright	2023
dc.date.issued	2023-01
dc.identifier.other	ID: 19101168
dc.identifier.other	ID: 19101149
dc.identifier.other	ID: 19101232
dc.identifier.other	ID: 19101170
dc.identifier.uri	http://hdl.handle.net/10361/19358
dc.description	This thesis is submitted in partial fulfillment of the requirements for the degree of Bachelor of Science in Computer Science, 2023.	en_US
dc.description	Cataloged from PDF version of thesis.
dc.description	Includes bibliographical references (pages 35-38).
dc.description.abstract	The task of image captioning is a complex process that involves generating textual descriptions for images. This technology is extremely beneficial for a wide range of applications, such as assisting people with visual impairments, monitoring surveil lance systems, content generation, image indexing, and automatic annotation of images for producing data for training AI-based image generation models. Much of the research done in this particular domain, especially using transformer models, has been focused on English language. However, there has been relatively little research dedicated to the context of the Bengali language. This study addresses the lack of research in the context of Bengali language and proposes a novel approach to auto matic image captioning that involves a multi-modal, transformer-based, end-to-end model with an encoder-decoder architecture. Our approach utilizes pre-trained Ef ficientNet Transformer Network. To evaluate the effectiveness of our approach, we compare our model with a Vision Transformer that utilizes a non-convolutional en coder pre-trained on ImageNet.The two models were tested on the BanglaLekhaIm ageCaptions dataset and evaluated using BLEU metrics.	en_US
dc.description.statementofresponsibility	Muhammad Khubayeeb Kabir
dc.description.statementofresponsibility	Anindita Labonno
dc.description.statementofresponsibility	Sofia Amin
dc.description.statementofresponsibility	Fariha Tahsin
dc.format.extent	38 pages
dc.language.iso	en	en_US
dc.publisher	Brac University	en_US
dc.rights	Brac University theses are protected by copyright. They may be viewed from this source for any purpose, but reproduction or distribution in any format is prohibited without written permission.
dc.subject	Image captioning	en_US
dc.subject	Image encoders	en_US
dc.subject	EfficientNet	en_US
dc.subject	Vision transformer	en_US
dc.subject	BanglaLekhaImageCaptions	en_US
dc.subject	BLEU	en_US
dc.subject	Transformer architecture	en_US
dc.subject.lcsh	Image analysis.
dc.subject.lcsh	Image processing--Digital techniques.
dc.title	Automatic Bengali image captioning using efficientNet-transformer network and vision transformer	en_US
dc.type	Thesis	en_US
dc.contributor.department	Department of Computer Science and Engineering, Brac University
dc.description.degree	B. Computer Science

Files in this item

Name:: 19101168, 19101149, 19101232, ...
Size:: 2.075Mb
Format:: PDF

View/Open

This item appears in the following Collection(s)

Thesis & Report, BSc (Computer Science and Engineering) [1480]

Show simple item record