Show simple item record

dc.contributor.advisorMostakim, Moin
dc.contributor.authorChowdhury, Salman Mostafiz
dc.contributor.authorRohid, Ali Ahammed
dc.contributor.authorHussain, Rizwan
dc.contributor.authorMostafa, Chowdhury Sujana
dc.date.accessioned2022-08-23T04:42:05Z
dc.date.available2022-08-23T04:42:05Z
dc.date.copyright2022
dc.date.issued2022-01
dc.identifier.otherID: 17101149
dc.identifier.otherID: 17101361
dc.identifier.otherID: 17301092
dc.identifier.otherID: 21101107
dc.identifier.urihttp://hdl.handle.net/10361/17114
dc.descriptionThis thesis is submitted in partial fulfillment of the requirements for the degree of Bachelor of Science in Computer Science, 2022en_US
dc.descriptionCataloged from PDF version of thesis.
dc.descriptionIncludes bibliographical references (pages 39-43).
dc.description.abstractAudiovisual data is the most extensively used and abundantly distributed type of data on the internet in today’s information and communication age. However, the necessary audiovisual data is challenging to retrieve because the majority of them are not correctly categorized. As a result, it is difficult to locate the necessary au diovisual data in times of need, and as a result, a great amount of potentially useful information does not reach users in a timely manner. A piece of data is only as good as the time frame in which it was acquired. Additionally, because audiovisual files such as lecture notes, recorded classes, and recorded conversations are quite large, skimming through a large amount of audiovisual data for the necessary information can be time consuming, and the likelihood of not receiving the appropriate infor mation on time is relatively high. As a result, we propose a novel model that will take any audio or video file as input and label it according to its content utilizing Convolutional Neural Networks, BERT, and several machine learning techniques. Our proposed model accepts any audiovisual file as input and extracts features from the contents and uses convolutional neural network and transformer to recognize and transcript the speeches of the conversations. Using BERT models and cosine similarity keywords and phrases are extracted from the transcript and the input file will be labeled with the key phrases and keywords that are most similar to the con text of the content. Finally, the input file will be appropriately labeled with these key phrases and keywords so that anyone in the future in need of similar information can quickly locate this audiovisual file.en_US
dc.description.statementofresponsibilitySalman Mostafiz Chowdhury
dc.description.statementofresponsibilityAli Ahammed Rohid
dc.description.statementofresponsibilityRizwan Hussain
dc.description.statementofresponsibilityChowdhury Sujana Mostafa
dc.format.extent43 Pages
dc.language.isoen_USen_US
dc.publisherBrac Universityen_US
dc.rightsBrac University theses are protected by copyright. They may be viewed from this source for any purpose, but reproduction or distribution in any format is prohibited without written permission.
dc.subjectSpeech Recognitionen_US
dc.subjectWav2vec2en_US
dc.subjectBERTen_US
dc.subjectAVSpeechen_US
dc.subjectKeywords extractionen_US
dc.subjectNeural Networksen_US
dc.subjectNLPen_US
dc.subject.lcshAutomatic speech recognition
dc.subject.lcshNeural networks (Computer science)
dc.titleAVCL: Audio Video clustering for learning Conversation labeling using Neural Network and NLPen_US
dc.typeThesisen_US
dc.contributor.departmentDepartment of Computer Science and Engineering, Brac University
dc.description.degreeB. Computer Science


Files in this item

Thumbnail

This item appears in the following Collection(s)

Show simple item record