Enhancing software quality: Python code smell detection using machine learning techniques and refactoring long methods using extract method algorithm
Abstract
Python has witnessed substantial growth, establishing itself as one of the world’s
most popular programming languages. Its versatile applications span various software
and data science projects, empowered by features like classes, method chaining,
lambda functions, and list comprehension. However, this flexibility introduces the
risk of code smells, diminishing software quality, and complicating maintenance.
While extensive research addresses code smells in Java, the Python landscape lacks
comprehensive automated solutions. Our paper fills this gap in two ways. Firstly,
by constructing a dataset using a tool from existing literature, Pysmell [16]. The
tool, given a project directory, determines python files and produces comma separated
files for code smells that are present in the python file. We create a dataset
containing github projects and run the tool on our dataset. Then we select five
comma separated code smell files: Large Class, Long Method, Long Lambda Function,
Long Parameter List and Long Message Chain. The comma separated files
are then combined to produce a multi-label dataset of code smells. Ensemble techniques
and neural networks are trained on the dataset to analyse the performance of
machine learning models in predicting code smells given a metric. Secondly, our approach
extends to designing and building a simple automated refactoring algorithm,
aiming to reduce long method code smells by extracting out large if-else statements
and elevate overall software quality. In a landscape where automated detection and
refactoring for Python code smells are nascent, our research contributes essential
advancements.