Motion based gesture detection using frame composition LSTM

Islam, Ishraqul; Islam, Md. Saqif; Provat, Mahin Islam; Khandakar, Shaneen Shadman; Karim, Fardin Junayed

View/Open

19141008, 19101238, 19101074, 19101176, 19101198_CSE.pdf (4.868Mb)

Date

2022-09-28

Publisher

Brac University

Abstract

Years of technological progress have made machines capable of identifying humans in images and videos. Moreover, machines like computers can also detect our hand gestures. Gesture recognition is the tool needed to comprehend sign languages. Sign language recognition is an important part of computer vision that uses the visual-manual modality of expression. This method solves the communication barrier between the deaf and mute and the common people. Currently, in the world, there are around 432 Million deaf mutes which is around 5% of the total global population. To solve this problem of communication gap we are focusing on creating an application for detecting sign language which will detect hand gestures and show us the output in the form of text. There are different sign languages present, but in our paper, we are mainly dealing with American Sign Language ( ASL ). Thus for this research, there are certain datasets present on the internet but we will be collecting our own set of words via our Real-time data collection system and make the sentences by using our model. To develop this model we are using both Long Short Term Memory. LSTM networks are a class of RNN that may learn order dependency in sequence prediction challenges. This is a necessary characteristic in complicated problem fields such as machine translation, and speech recognition, therefore we will be using it to recognize the gesture from images and video captured via the camera or webcam. Furthermore, to detect the pose and model it, we are using the MediaPipe Holistic library with the help of OpenCV. This helps us draw the landmarks on skeleton poses. Thus, giving us a generalized overview of an individual’s appearance and background, allowing more focus on the perception of motion. Hence, extracting features from each frame of our videos and then composing them onto LSTM lead us into naming our model Frame Composition LSTM.

Keywords

Frame composition LSTM; ASL; MediaPipe holistic; Hand gesture recognition

LC Subject Headings

Human-computer interaction; Computer communication systems

Description

This thesis is submitted in partial fulfillment of the requirements for the degree of Bachelor of Science in Computer Science, 2022.

Cataloged from PDF version of thesis.

Includes bibliographical references (pages 36-38).

Department

Department of Computer Science and Engineering, Brac University

Type

Thesis

Collections

Thesis & Report, BSc (Computer Science and Engineering) [1586]