Hand gesture recognition using ensemble method

Kowsar, Sahib; Chowdhury, Mahzabin; Mahmud, MD Safin; Haque, Shahbaj Shafin; Shifa, Asaka Akther

View/Open

19301096_19301084_19301231_19101566_19301069 - MAHZABIN CHOWDHURY.pdf (1.725Mb)

Date

2023-05

Publisher

Brac University

Abstract

Even though things have improved much more over the last century in terms of com- munication, there still is a glaring amount of communication gap between the hearing majority and the deaf community due to the lack of resources in the field. Real time hand gesture recognition development tries to tear down this communication barrier and open a new common ground for everyone and hand gesture recognition plays a vital role in human-computer interaction as well. There are several ideas on how to build a model to properly recognize sign languages. The models differ based on the computation time it takes, the algorithms used and if it can be used in real time or not. In this work we take a thorough analysis of real-time hand gesture recognition models and proposes a pipeline-based approach to select the best-performing model as the final output. We chose to work with four datasets that are being used here for comparison, SLR500, AUTSL-226, WLASL2000 and WLASL100. The goal here is to find a way to overcome the limitations of data scarcity in the field along with the imbalance in classification problems. We work with video inputs to run them through different modalities simultaneously through a set of pipelines to produce outputs which would then be used in getting the final classification result by using the core idea of generating the final output of the ensemble technique. Various data pre-processing techniques are used such as regularization, histogram equalization etc. to minimize the varying skin tone bias to make it a more inclusive model for better classification and improved accuracy score. The existing models have no way to deal with biases encountered in sign language detection and we take various dif- ferent approaches to overcome such limitations. In general pristine cases for around 500 classes the model performs 96.32 percent in terms of top-1 accuracy.

Keywords

Pattern matching; Feature extraction; SSTCN; SL-GCN; Pipeline; Transfer learning; Histogram matching

LC Subject Headings

Artificial intelligence; Optical pattern recognition; Computer vision

Description

This thesis is submitted in partial fulfillment of the requirements for the degree of Bachelor of Science in Computer Science, 2023.

Cataloged from PDF version of thesis.

Includes bibliographical references (pages 27-28).

Department

Department of Computer Science and Engineering, Brac University

Type

Thesis

Collections

Thesis & Report, BSc (Computer Science and Engineering) [1402]