Predicting peak performance of a cricket player using machine learning and data analytics
Abstract
In the modern era, the game of cricket has evolved into a batting friendly lexicon more than ever. However, bowlers adapting to every suitable condition can also change the dynamics of the game. Prior studies were carried out, mostly focusing on team combinations and batting analytics but did not highlight the batter and bowler’s potential. This paper seeks to understand the conundrum behind this impactful performance by determining how much control a player has over the circumstances and generating the “Effective Runs” and “Effective Wickets,” two new measures we propose. We first gathered the fundamental cricket data from open source datasets. However, variables like the pitch, weather, and control were not readily available for all matches. As a result, we compiled our corpus data by analyzing ball-by-ball commentary of the match summaries that led us to determine the control of the shots played by the batter as well as deliveries that were in control by the bowler. Our dataset comprised seven renowned international cricketers. For batters we prepared the dataset, encoded, scaled, and split the dataset to train and test Machine Learning Algorithms and predict the impact the player will have on the game. Multiple Linear Regression and Random Forest give the best predictions accuracy of 90.16% and 87.12%, respectively. On the other hand, for bowlers, we upscaled the wickets taken by the bowler and set a threshold accordingly. Given that the threshold was met, we concluded that the effective wickets taken by the bowler were impactful with regards to the overall match performance. Machine Learning classifiers were trained to predict this impact of a bowler. The best individual accuracy result was provided by Logistic regression for the Spinners at 73.21% and SVM Classifier for the Seamers at 79.17%. However, the overall best average precision for both types of players was observed at 78.75% by Logistic Regression.