Enhancing software quality: Python code smell detection using machine learning techniques and refactoring long methods using extract method algorithm

Ferdoshi, Jannatul; Abdullah, Shabab; Knobo, Kazi Zunayed Quader; Uddin, Mohammed Sharraf

View/Open

20241018, 20241020, 20301005, 20301193 _CSE.pdf (4.615Mb)

Date

2024-05

Publisher

Brac University

Abstract

Python has witnessed substantial growth, establishing itself as one of the world’s most popular programming languages. Its versatile applications span various software and data science projects, empowered by features like classes, method chaining, lambda functions, and list comprehension. However, this flexibility introduces the risk of code smells, diminishing software quality, and complicating maintenance. While extensive research addresses code smells in Java, the Python landscape lacks comprehensive automated solutions. Our paper fills this gap in two ways. Firstly, by constructing a dataset using a tool from existing literature, Pysmell [16]. The tool, given a project directory, determines python files and produces comma separated files for code smells that are present in the python file. We create a dataset containing github projects and run the tool on our dataset. Then we select five comma separated code smell files: Large Class, Long Method, Long Lambda Function, Long Parameter List and Long Message Chain. The comma separated files are then combined to produce a multi-label dataset of code smells. Ensemble techniques and neural networks are trained on the dataset to analyse the performance of machine learning models in predicting code smells given a metric. Secondly, our approach extends to designing and building a simple automated refactoring algorithm, aiming to reduce long method code smells by extracting out large if-else statements and elevate overall software quality. In a landscape where automated detection and refactoring for Python code smells are nascent, our research contributes essential advancements.

Keywords

Python; Code refactoring; Machine learning; Software quality; Software maintenance; GitHub repositories

LC Subject Headings

Software maintenance--Data processing.; Computer software--Quality control--Data processing.; Software failures--Prevention--Data processing.; Computer system failures.; Software refactoring.; Python (Computer program language).

Description

This thesis is submitted in partial fulfillment of the requirements for the degree of Bachelor of Science in Computer Science, 2024.

Catalogued from PDF version of thesis.

Includes bibliographical references (pages 65-68).

Department

Department of Computer Science and Engineering, Brac University

Type

Thesis

Collections

Thesis & Report, BSc (Computer Science and Engineering) [1589]