dc.contributor.advisor | Rahman, Md. Khalilur | |
dc.contributor.advisor | Mostakim, Moin | |
dc.contributor.author | Kabir, Muhammad Khubayeeb | |
dc.contributor.author | Labonno, Anindita | |
dc.contributor.author | Amin, Sofia | |
dc.contributor.author | Tahsin, Fariha | |
dc.date.accessioned | 2023-08-08T05:48:24Z | |
dc.date.available | 2023-08-08T05:48:24Z | |
dc.date.copyright | 2023 | |
dc.date.issued | 2023-01 | |
dc.identifier.other | ID: 19101168 | |
dc.identifier.other | ID: 19101149 | |
dc.identifier.other | ID: 19101232 | |
dc.identifier.other | ID: 19101170 | |
dc.identifier.uri | http://hdl.handle.net/10361/19358 | |
dc.description | This thesis is submitted in partial fulfillment of the requirements for the degree of Bachelor of Science in Computer Science, 2023. | en_US |
dc.description | Cataloged from PDF version of thesis. | |
dc.description | Includes bibliographical references (pages 35-38). | |
dc.description.abstract | The task of image captioning is a complex process that involves generating textual
descriptions for images. This technology is extremely beneficial for a wide range of
applications, such as assisting people with visual impairments, monitoring surveil lance systems, content generation, image indexing, and automatic annotation of
images for producing data for training AI-based image generation models. Much of
the research done in this particular domain, especially using transformer models, has
been focused on English language. However, there has been relatively little research
dedicated to the context of the Bengali language. This study addresses the lack of
research in the context of Bengali language and proposes a novel approach to auto matic image captioning that involves a multi-modal, transformer-based, end-to-end
model with an encoder-decoder architecture. Our approach utilizes pre-trained Ef ficientNet Transformer Network. To evaluate the effectiveness of our approach, we
compare our model with a Vision Transformer that utilizes a non-convolutional en coder pre-trained on ImageNet.The two models were tested on the BanglaLekhaIm ageCaptions dataset and evaluated using BLEU metrics. | en_US |
dc.description.statementofresponsibility | Muhammad Khubayeeb Kabir | |
dc.description.statementofresponsibility | Anindita Labonno | |
dc.description.statementofresponsibility | Sofia Amin | |
dc.description.statementofresponsibility | Fariha Tahsin | |
dc.format.extent | 38 pages | |
dc.language.iso | en | en_US |
dc.publisher | Brac University | en_US |
dc.rights | Brac University theses are protected by copyright. They may be viewed from this source for any purpose, but reproduction or distribution in any format is prohibited without written permission. | |
dc.subject | Image captioning | en_US |
dc.subject | Image encoders | en_US |
dc.subject | EfficientNet | en_US |
dc.subject | Vision transformer | en_US |
dc.subject | BanglaLekhaImageCaptions | en_US |
dc.subject | BLEU | en_US |
dc.subject | Transformer architecture | en_US |
dc.subject.lcsh | Image analysis. | |
dc.subject.lcsh | Image processing--Digital techniques. | |
dc.title | Automatic Bengali image captioning using efficientNet-transformer network and vision transformer | en_US |
dc.type | Thesis | en_US |
dc.contributor.department | Department of Computer Science and Engineering, Brac University | |
dc.description.degree | B. Computer Science | |