Efficient Spatio-temporal feature extraction for human action recognition

Ghosh, Dipon Kumar

dc.contributor.advisor	Chakrabarty, Amitabha
dc.contributor.author	Ghosh, Dipon Kumar
dc.date.accessioned	2022-01-17T06:37:05Z
dc.date.available	2022-01-17T06:37:05Z
dc.date.copyright	2021
dc.date.issued	2021-11
dc.identifier.other	ID 19366007
dc.identifier.uri	http://hdl.handle.net/10361/15946
dc.description	This thesis is submitted in partial fulfilment of the requirements for the degree of Master of Engineering in Computer Science and Engineering, 2021.	en_US
dc.description	Cataloged from PDF version of thesis.
dc.description	Includes bibliographical references (pages 67-75).
dc.description.abstract	Human actuation recognition (HAR) has been performed using current deep learning (DL) algorithms using a variety of input formats, including video footage, optical flow, and even skeleton points, which may be acquired via depth sensors or pose estimation technologies. Recent techniques, on the other hand, are computationally costly and have a high memory footprint, making them unsuitable for use in realworld environments. Furthermore, the design of existing techniques does not allow for the full extraction of spatial and temporal characteristics of an action, and as a result, information is lost throughout the recognition process. Here, we present a novel framework for action recognition that extracts spatial and temporal characteristics separately while reducing the amount of information lost by a substantial amount. The multi-dimensional convolutional network (MDCN) and the redefined spatio-temporal graph convolutional network (RSTCN) are two models developed in accordance with this framework. In both cases, spatial and temporal information are extracted irrespective of the precise spatio-temporal location. Our approach was evaluated in two particular aspects of human action recognition, namely violence detection and skeleton-based action recognition, in order to ensure that our models were accurate and reliable. In spite of being cost e↵ective and having less parameters, our proposed MDCN achieved 87.5% accuracy in the largest violence detection benchmark dataset and RST-GCN obtained 92.2% accuracy on the skeleton dataset. The performance of our models edge devices with limited resources, which are suitable for deploying at real-world environments is also also analyze and compare, such as surveillance system and smart healthcare system. The proposed MDCN model processes 80 frames per second on edge device such as, Nvidia Jetson Nano and RST-GCN performs at a speed of 993 frames per second. Our proposed methods o↵er a strong balance between accuracy, memory consumption, and processing time, which make them suitable for deploying at real-world environments.	en_US
dc.description.statementofresponsibility	Dipon Kumar Ghosh
dc.format.extent	75 pages
dc.language.iso	en	en_US
dc.publisher	Brac University	en_US
dc.rights	Brac University theses are protected by copyright. They may be viewed from this source for any purpose, but reproduction or distribution in any format is prohibited without written permission.
dc.subject	Human action recognition (HAR)	en_US
dc.subject	Surveillance systems	en_US
dc.subject	Violence detection	en_US
dc.subject	Skeleton-based human action recognition	en_US
dc.subject	Convolutional neural network (CNN)	en_US
dc.subject	Graph convolutional networks (GCN)	en_US
dc.subject	Feature fusion	en_US
dc.subject.lcsh	Human activity recognition
dc.subject.lcsh	Neural network (Computer Science)
dc.title	Efficient Spatio-temporal feature extraction for human action recognition	en_US
dc.type	Thesis	en_US
dc.contributor.department	Department of Computer Science and Engineering, Brac University
dc.description.degree	M. Computer Science and Engineering

Files in this item

Name:: 19366007_CSE.pdf
Size:: 15.50Mb
Format:: PDF

View/Open

This item appears in the following Collection(s)

Thesis & Report, MSc (Computer Science and Engineering) [87]

Show simple item record