dc.contributor.advisor | Chakrabarty, Amitabha | |
dc.contributor.author | Ghosh, Dipon Kumar | |
dc.date.accessioned | 2022-01-17T06:37:05Z | |
dc.date.available | 2022-01-17T06:37:05Z | |
dc.date.copyright | 2021 | |
dc.date.issued | 2021-11 | |
dc.identifier.other | ID 19366007 | |
dc.identifier.uri | http://hdl.handle.net/10361/15946 | |
dc.description | This thesis is submitted in partial fulfilment of the requirements for the degree of Master of Engineering in Computer Science and Engineering, 2021. | en_US |
dc.description | Cataloged from PDF version of thesis. | |
dc.description | Includes bibliographical references (pages 67-75). | |
dc.description.abstract | Human actuation recognition (HAR) has been performed using current deep learning
(DL) algorithms using a variety of input formats, including video footage, optical
flow, and even skeleton points, which may be acquired via depth sensors or pose
estimation technologies. Recent techniques, on the other hand, are computationally
costly and have a high memory footprint, making them unsuitable for use in realworld
environments. Furthermore, the design of existing techniques does not allow
for the full extraction of spatial and temporal characteristics of an action, and as
a result, information is lost throughout the recognition process. Here, we present a
novel framework for action recognition that extracts spatial and temporal characteristics
separately while reducing the amount of information lost by a substantial
amount. The multi-dimensional convolutional network (MDCN) and the redefined
spatio-temporal graph convolutional network (RSTCN) are two models developed
in accordance with this framework. In both cases, spatial and temporal information
are extracted irrespective of the precise spatio-temporal location. Our approach was
evaluated in two particular aspects of human action recognition, namely violence detection
and skeleton-based action recognition, in order to ensure that our models
were accurate and reliable. In spite of being cost e↵ective and having less parameters,
our proposed MDCN achieved 87.5% accuracy in the largest violence detection
benchmark dataset and RST-GCN obtained 92.2% accuracy on the skeleton dataset.
The performance of our models edge devices with limited resources, which are suitable
for deploying at real-world environments is also also analyze and compare, such
as surveillance system and smart healthcare system. The proposed MDCN model
processes 80 frames per second on edge device such as, Nvidia Jetson Nano and
RST-GCN performs at a speed of 993 frames per second. Our proposed methods
o↵er a strong balance between accuracy, memory consumption, and processing time,
which make them suitable for deploying at real-world environments. | en_US |
dc.description.statementofresponsibility | Dipon Kumar Ghosh | |
dc.format.extent | 75 pages | |
dc.language.iso | en | en_US |
dc.publisher | Brac University | en_US |
dc.rights | Brac University theses are protected by copyright. They may be viewed from this source for any purpose, but reproduction or distribution in any format is prohibited without written permission. | |
dc.subject | Human action recognition (HAR) | en_US |
dc.subject | Surveillance systems | en_US |
dc.subject | Violence detection | en_US |
dc.subject | Skeleton-based human action recognition | en_US |
dc.subject | Convolutional neural network (CNN) | en_US |
dc.subject | Graph convolutional networks (GCN) | en_US |
dc.subject | Feature fusion | en_US |
dc.subject.lcsh | Human activity recognition | |
dc.subject.lcsh | Neural network (Computer Science) | |
dc.title | Efficient Spatio-temporal feature extraction for human action recognition | en_US |
dc.type | Thesis | en_US |
dc.contributor.department | Department of Computer Science and Engineering, Brac University | |
dc.description.degree | M. Computer Science and Engineering | |