Analysis of real-time hostile activitiy detection from spatiotemporal features using time distributed deep convolutional neural networks, recurrent neural networks and attention-based mechanisms

Siddique, Labib Ahmed; Junhai, Rabita; Islam, Moshfeka; Qader, Shafinaz

View/Open

17101473, 17101195, 17101329, 14101225_CSE.pdf (2.760Mb)

Date

2022-05

Publisher

Brac University

Abstract

Throughout time, there has been a surge of hostile activities in public places across the globe. With the advancement in technology, it has been possible to monitor public places through real time surveillance. Video surveillance has become essential for ensuring public safety as it provides a significant benefit in lowering the crime rate, as well as monitoring the facility within its reach. Hence, CCTV cameras are installed in all areas where security is a priority. Although CCTV cameras help a lot in increasing security, the main drawback in these surveillance systems is that it requires constant human interaction and monitoring. To eradicate this issue, an automated surveillance system can be built using artificial intelligence, deep learning and IoT (Internet of things). So in this research we explore deep learn ing video classification techniques that can help us automate surveillance systems to detect violence as they are happening. Traditional machine learning or image classification techniques fall short when it comes to classifying videos as they attempt to classify each frame separately for which the predictions start to flicker. So many researchers are coming up with video classification techniques that consider spatiotemporal features while classifying. However, deploying these deep learning models are not always practical in an IoT environment. For this reason we cannot use techniques that are acquired like skeleton points and optical flow through technologies like pose estimation or depth sensors. Although these techniques ensure a higher accuracy score, they are computationally heavy. Keeping these constraints in mind, we experimented with various video classification and action recognition techniques such as ConvLSTM, LRCN (with both custom CNN layers and VGG-16 as feature extractor) CNN-Transformer and C3D (3D-CNN). We achieved a test accuracy of 80% on ConvLSTM, 83.33% on CNN-BiLSTM, 70% on VGG16-BiLstm ,76.76% on CNN-Transformer and 80% on C3D model.

Keywords

Artificial Intelligence; Deep learning; Neural network; Violence detection; Video classification; Attention based encoder; LRCN; ConvLSTM; Transformer; C3D

LC Subject Headings

Neural networks (Computer science); Neural network.; Deep learning (Machine learning)

Description

This thesis is submitted in partial fulfillment of the requirements for the degree of Bachelor of Science in Computer Science and Engineering, 2022.

Cataloged from PDF version of thesis.

Includes bibliographical references (pages 48-51).

Department

Department of Computer Science and Engineering, Brac University

Type

Thesis

Collections

Thesis & Report, BSc (Computer Science and Engineering) [1589]