Motion based gesture detection using frame composition LSTM
Date
2022-09-28Publisher
Brac UniversityAuthor
Islam, IshraqulIslam, Md. Saqif
Provat, Mahin Islam
Khandakar, Shaneen Shadman
Karim, Fardin Junayed
Metadata
Show full item recordAbstract
Years of technological progress have made machines capable of identifying humans
in images and videos. Moreover, machines like computers can also detect our hand
gestures. Gesture recognition is the tool needed to comprehend sign languages.
Sign language recognition is an important part of computer vision that uses the
visual-manual modality of expression. This method solves the communication barrier between the deaf and mute and the common people. Currently, in the world,
there are around 432 Million deaf mutes which is around 5% of the total global population. To solve this problem of communication gap we are focusing on creating
an application for detecting sign language which will detect hand gestures and show
us the output in the form of text. There are different sign languages present, but in
our paper, we are mainly dealing with American Sign Language ( ASL ). Thus for
this research, there are certain datasets present on the internet but we will be collecting our own set of words via our Real-time data collection system and make the
sentences by using our model. To develop this model we are using both Long Short
Term Memory. LSTM networks are a class of RNN that may learn order dependency
in sequence prediction challenges. This is a necessary characteristic in complicated
problem fields such as machine translation, and speech recognition, therefore we will
be using it to recognize the gesture from images and video captured via the camera
or webcam. Furthermore, to detect the pose and model it, we are using the MediaPipe Holistic library with the help of OpenCV. This helps us draw the landmarks
on skeleton poses. Thus, giving us a generalized overview of an individual’s appearance and background, allowing more focus on the perception of motion. Hence, extracting features from each frame of our videos and then composing them onto
LSTM lead us into naming our model Frame Composition LSTM.