Detection of violent activity in surveillance system using different deep learning techniques
Abstract
In the past decade, surveillance cameras have been a necessary integration for security
measures in all types of localities. The omnipresence of these devices has
substantially aided in tackling violent criminal activities. In larger systems, continuous
manual monitoring becomes a cumbersome task and often causes delayed
response. Therefore, automated recognition of aggressive activities in surveillance
systems can enhance the remote monitoring experience and increase the preciseness
of response. Previous experiments on various deep-learning techniques and Convolutional
Neural Networks (CNN) have tackled the challenge by identifying potential
violent activities in real-time with good accuracy. The aim of this research is to
benefit from reduced computational cost while maintaining optimality for practical
implementation in real life. Hence, in this study, preliminarily a lightweight yet
highly effective CNN model has been proposed that extracts spatial features by
2D convolutions. Later on several custom models based on combinations of CNN
and RNN architectures have been developed for spatio-temporal features from the
videos. The models have undergone robust tuning and training and are capable
of accurately extracting frame-level and temporal-level features based on the architectural
types. They have been then conclusively evaluated on a combination
of multiple benchmark datasets to compare how well each of them performs. In
conclusion, the proposed spatial feature-based model obtained an outstanding test
accuracy of 99.6% and the best spatio-temporal feature-based model in terms of
performance attained a test accuracy of 98.75%.