Show simple item record

dc.contributor.advisorRahman, Md. Khalilur
dc.contributor.advisorMostakim, Moin
dc.contributor.authorKabir, Muhammad Khubayeeb
dc.contributor.authorLabonno, Anindita
dc.contributor.authorAmin, Sofia
dc.contributor.authorTahsin, Fariha
dc.date.accessioned2023-08-08T05:48:24Z
dc.date.available2023-08-08T05:48:24Z
dc.date.copyright2023
dc.date.issued2023-01
dc.identifier.otherID: 19101168
dc.identifier.otherID: 19101149
dc.identifier.otherID: 19101232
dc.identifier.otherID: 19101170
dc.identifier.urihttp://hdl.handle.net/10361/19358
dc.descriptionThis thesis is submitted in partial fulfillment of the requirements for the degree of Bachelor of Science in Computer Science, 2023.en_US
dc.descriptionCataloged from PDF version of thesis.
dc.descriptionIncludes bibliographical references (pages 35-38).
dc.description.abstractThe task of image captioning is a complex process that involves generating textual descriptions for images. This technology is extremely beneficial for a wide range of applications, such as assisting people with visual impairments, monitoring surveil lance systems, content generation, image indexing, and automatic annotation of images for producing data for training AI-based image generation models. Much of the research done in this particular domain, especially using transformer models, has been focused on English language. However, there has been relatively little research dedicated to the context of the Bengali language. This study addresses the lack of research in the context of Bengali language and proposes a novel approach to auto matic image captioning that involves a multi-modal, transformer-based, end-to-end model with an encoder-decoder architecture. Our approach utilizes pre-trained Ef ficientNet Transformer Network. To evaluate the effectiveness of our approach, we compare our model with a Vision Transformer that utilizes a non-convolutional en coder pre-trained on ImageNet.The two models were tested on the BanglaLekhaIm ageCaptions dataset and evaluated using BLEU metrics.en_US
dc.description.statementofresponsibilityMuhammad Khubayeeb Kabir
dc.description.statementofresponsibilityAnindita Labonno
dc.description.statementofresponsibilitySofia Amin
dc.description.statementofresponsibilityFariha Tahsin
dc.format.extent38 pages
dc.language.isoenen_US
dc.publisherBrac Universityen_US
dc.rightsBrac University theses are protected by copyright. They may be viewed from this source for any purpose, but reproduction or distribution in any format is prohibited without written permission.
dc.subjectImage captioningen_US
dc.subjectImage encodersen_US
dc.subjectEfficientNeten_US
dc.subjectVision transformeren_US
dc.subjectBanglaLekhaImageCaptionsen_US
dc.subjectBLEUen_US
dc.subjectTransformer architectureen_US
dc.subject.lcshImage analysis.
dc.subject.lcshImage processing--Digital techniques.
dc.titleAutomatic Bengali image captioning using efficientNet-transformer network and vision transformeren_US
dc.typeThesisen_US
dc.contributor.departmentDepartment of Computer Science and Engineering, Brac University
dc.description.degreeB. Computer Science


Files in this item

Thumbnail

This item appears in the following Collection(s)

Show simple item record