AVCL: Audio Video clustering for learning Conversation labeling using Neural Network and NLP

Chowdhury, Salman Mostafiz; Rohid, Ali Ahammed; Hussain, Rizwan; Mostafa, Chowdhury Sujana

View/Open

17101149, 17101361, 17301092, 21101107_CSE.pdf (1.009Mb)

Date

2022-01

Publisher

Brac University

Abstract

Audiovisual data is the most extensively used and abundantly distributed type of data on the internet in today’s information and communication age. However, the necessary audiovisual data is challenging to retrieve because the majority of them are not correctly categorized. As a result, it is difficult to locate the necessary au diovisual data in times of need, and as a result, a great amount of potentially useful information does not reach users in a timely manner. A piece of data is only as good as the time frame in which it was acquired. Additionally, because audiovisual files such as lecture notes, recorded classes, and recorded conversations are quite large, skimming through a large amount of audiovisual data for the necessary information can be time consuming, and the likelihood of not receiving the appropriate infor mation on time is relatively high. As a result, we propose a novel model that will take any audio or video file as input and label it according to its content utilizing Convolutional Neural Networks, BERT, and several machine learning techniques. Our proposed model accepts any audiovisual file as input and extracts features from the contents and uses convolutional neural network and transformer to recognize and transcript the speeches of the conversations. Using BERT models and cosine similarity keywords and phrases are extracted from the transcript and the input file will be labeled with the key phrases and keywords that are most similar to the con text of the content. Finally, the input file will be appropriately labeled with these key phrases and keywords so that anyone in the future in need of similar information can quickly locate this audiovisual file.

Keywords

Speech Recognition; Wav2vec2; BERT; AVSpeech; Keywords extraction; Neural Networks; NLP

LC Subject Headings

Automatic speech recognition; Neural networks (Computer science)

Description

This thesis is submitted in partial fulfillment of the requirements for the degree of Bachelor of Science in Computer Science, 2022

Cataloged from PDF version of thesis.

Includes bibliographical references (pages 39-43).

Department

Department of Computer Science and Engineering, Brac University

Type

Thesis

Collections

Thesis & Report, BSc (Computer Science and Engineering) [1589]