Real-time crime detection using convolutional LSTM and YOLOv7
Abstract
The principal goal of this study is to create a crime detection system in real-time that
can effectively handle closed-circuit television (CCTV) video feeds and evaluate them
for possible criminal occurrences. The system’s goal is to improve public safety by
offering an advanced approach that makes use of ConvLSTM’s expertise in modeling
temporal dynamics and YOLO v7’s expertise in object recognition. We suggest a
posture and weapon recognition system that can be applied to real-time videos.
The first method proposes the utilization of ConvLSTM for the detection of violent
postures. The Conv part is derived from MobileNet v2, while a bi-directional LSTM
technique is used. MobileNet v2 was chosen for its superior accuracy and efficiency
as a result of its lightweight architecture. The model will be trained to recognize
illegal behavior by being exposed to annotated datasets of surveillance videos that
depict different types of crime. The output of the system distinguishes between
violent and non-violent postures in real-time videos. The system identifies violent
postures as kicking, collar grabbing, choking, hair pulling, punching, slapping, etc.,
while identifying non-violent postures as hugging, handshaking, touching shoulders,
walking, etc. We used the real-time violence and non-violence dataset from Kaggle.
The second method uses YOLO v7 to detect weapons in three categories, e.g., sticks,
guns, and sharp objects. The YOLO v4 was also employed for the aforementioned
objective; however, the YOLO v7 yielded superior outcomes, hence it was chosen for
further implementation. We customized the weapons dataset to enable our model
to accurately detect local Asian weapons like machetes and sticks. The system’s
intended use is to prevent illegal acts using two distinct machine learning models in
a seamless way.