Hand gesture recognition using ensemble method
Abstract
Even though things have improved much more over the last century in terms of com- munication, there still is a glaring amount of communication gap between the hearing majority and the deaf community due to the lack of resources in the field. Real time hand gesture recognition development tries to tear down this communication barrier and open a new common ground for everyone and hand gesture recognition plays a vital role in human-computer interaction as well. There are several ideas on how to build a model to properly recognize sign languages. The models differ based on the computation time it takes, the algorithms used and if it can be used in real time or not. In this work we take a thorough analysis of real-time hand gesture recognition models and proposes a pipeline-based approach to select the best-performing model as the final output. We chose to work with four datasets that are being used here for comparison, SLR500, AUTSL-226, WLASL2000 and WLASL100. The goal here is to find a way to overcome the limitations of data scarcity in the field along with the imbalance in classification problems. We work with video inputs to run them through different modalities simultaneously through a set of pipelines to produce outputs which would then be used in getting the final classification result by using the core idea of generating the final output of the ensemble technique. Various data pre-processing techniques are used such as regularization, histogram equalization etc. to minimize the varying skin tone bias to make it a more inclusive model for better classification and improved accuracy score. The existing models have no way to deal with biases encountered in sign language detection and we take various dif- ferent approaches to overcome such limitations. In general pristine cases for around 500 classes the model performs 96.32 percent in terms of top-1 accuracy.