dc.contributor.advisor | Mostakim, Moin | |
dc.contributor.author | Chowdhury, Salman Mostafiz | |
dc.contributor.author | Rohid, Ali Ahammed | |
dc.contributor.author | Hussain, Rizwan | |
dc.contributor.author | Mostafa, Chowdhury Sujana | |
dc.date.accessioned | 2022-08-23T04:42:05Z | |
dc.date.available | 2022-08-23T04:42:05Z | |
dc.date.copyright | 2022 | |
dc.date.issued | 2022-01 | |
dc.identifier.other | ID: 17101149 | |
dc.identifier.other | ID: 17101361 | |
dc.identifier.other | ID: 17301092 | |
dc.identifier.other | ID: 21101107 | |
dc.identifier.uri | http://hdl.handle.net/10361/17114 | |
dc.description | This thesis is submitted in partial fulfillment of the requirements for the degree of Bachelor of Science in Computer Science, 2022 | en_US |
dc.description | Cataloged from PDF version of thesis. | |
dc.description | Includes bibliographical references (pages 39-43). | |
dc.description.abstract | Audiovisual data is the most extensively used and abundantly distributed type of
data on the internet in today’s information and communication age. However, the
necessary audiovisual data is challenging to retrieve because the majority of them
are not correctly categorized. As a result, it is difficult to locate the necessary au diovisual data in times of need, and as a result, a great amount of potentially useful
information does not reach users in a timely manner. A piece of data is only as good
as the time frame in which it was acquired. Additionally, because audiovisual files
such as lecture notes, recorded classes, and recorded conversations are quite large,
skimming through a large amount of audiovisual data for the necessary information
can be time consuming, and the likelihood of not receiving the appropriate infor mation on time is relatively high. As a result, we propose a novel model that will
take any audio or video file as input and label it according to its content utilizing
Convolutional Neural Networks, BERT, and several machine learning techniques.
Our proposed model accepts any audiovisual file as input and extracts features from
the contents and uses convolutional neural network and transformer to recognize
and transcript the speeches of the conversations. Using BERT models and cosine
similarity keywords and phrases are extracted from the transcript and the input file
will be labeled with the key phrases and keywords that are most similar to the con text of the content. Finally, the input file will be appropriately labeled with these
key phrases and keywords so that anyone in the future in need of similar information
can quickly locate this audiovisual file. | en_US |
dc.description.statementofresponsibility | Salman Mostafiz Chowdhury | |
dc.description.statementofresponsibility | Ali Ahammed Rohid | |
dc.description.statementofresponsibility | Rizwan Hussain | |
dc.description.statementofresponsibility | Chowdhury Sujana Mostafa | |
dc.format.extent | 43 Pages | |
dc.language.iso | en_US | en_US |
dc.publisher | Brac University | en_US |
dc.rights | Brac University theses are protected by copyright. They may be viewed from this source for any purpose, but reproduction or distribution in any format is prohibited without written permission. | |
dc.subject | Speech Recognition | en_US |
dc.subject | Wav2vec2 | en_US |
dc.subject | BERT | en_US |
dc.subject | AVSpeech | en_US |
dc.subject | Keywords extraction | en_US |
dc.subject | Neural Networks | en_US |
dc.subject | NLP | en_US |
dc.subject.lcsh | Automatic speech recognition | |
dc.subject.lcsh | Neural networks (Computer science) | |
dc.title | AVCL: Audio Video clustering for learning Conversation labeling using Neural Network and NLP | en_US |
dc.type | Thesis | en_US |
dc.contributor.department | Department of Computer Science and Engineering, Brac University | |
dc.description.degree | B. Computer Science | |