Enhancing software quality: Python code smell detection using machine learning techniques and refactoring long methods using extract method algorithm

Ferdoshi, Jannatul; Abdullah, Shabab; Knobo, Kazi Zunayed Quader; Uddin, Mohammed Sharraf

dc.contributor.advisor	Azmain, Md. Aquib
dc.contributor.author	Ferdoshi, Jannatul
dc.contributor.author	Abdullah, Shabab
dc.contributor.author	Knobo, Kazi Zunayed Quader
dc.contributor.author	Uddin, Mohammed Sharraf
dc.date.accessioned	2024-11-25T05:27:09Z
dc.date.available	2024-11-25T05:27:09Z
dc.date.copyright	©2024
dc.date.issued	2024-05
dc.identifier.other	ID 20301193
dc.identifier.other	ID 20301005
dc.identifier.other	ID 20241020
dc.identifier.other	ID 20241018
dc.identifier.uri	http://hdl.handle.net/10361/24816
dc.description	This thesis is submitted in partial fulfillment of the requirements for the degree of Bachelor of Science in Computer Science, 2024.	en_US
dc.description	Catalogued from PDF version of thesis.
dc.description	Includes bibliographical references (pages 65-68).
dc.description.abstract	Python has witnessed substantial growth, establishing itself as one of the world’s most popular programming languages. Its versatile applications span various software and data science projects, empowered by features like classes, method chaining, lambda functions, and list comprehension. However, this flexibility introduces the risk of code smells, diminishing software quality, and complicating maintenance. While extensive research addresses code smells in Java, the Python landscape lacks comprehensive automated solutions. Our paper fills this gap in two ways. Firstly, by constructing a dataset using a tool from existing literature, Pysmell [16]. The tool, given a project directory, determines python files and produces comma separated files for code smells that are present in the python file. We create a dataset containing github projects and run the tool on our dataset. Then we select five comma separated code smell files: Large Class, Long Method, Long Lambda Function, Long Parameter List and Long Message Chain. The comma separated files are then combined to produce a multi-label dataset of code smells. Ensemble techniques and neural networks are trained on the dataset to analyse the performance of machine learning models in predicting code smells given a metric. Secondly, our approach extends to designing and building a simple automated refactoring algorithm, aiming to reduce long method code smells by extracting out large if-else statements and elevate overall software quality. In a landscape where automated detection and refactoring for Python code smells are nascent, our research contributes essential advancements.	en_US
dc.description.statementofresponsibility	Jannatul Ferdoshi
dc.description.statementofresponsibility	Shabab Abdullah
dc.description.statementofresponsibility	Kazi Zunayed Quader Knobo
dc.description.statementofresponsibility	Mohammed Sharraf Uddin
dc.format.extent	81 pages
dc.language.iso	en	en_US
dc.publisher	Brac University	en_US
dc.rights	Brac University theses are protected by copyright. They may be viewed from this source for any purpose, but reproduction or distribution in any format is prohibited without written permission.
dc.subject	Python	en_US
dc.subject	Code refactoring	en_US
dc.subject	Machine learning	en_US
dc.subject	Software quality	en_US
dc.subject	Software maintenance	en_US
dc.subject	GitHub repositories	en_US
dc.subject.lcsh	Software maintenance--Data processing.
dc.subject.lcsh	Computer software--Quality control--Data processing.
dc.subject.lcsh	Software failures--Prevention--Data processing.
dc.subject.lcsh	Computer system failures.
dc.subject.lcsh	Software refactoring.
dc.subject.lcsh	Python (Computer program language).
dc.title	Enhancing software quality: Python code smell detection using machine learning techniques and refactoring long methods using extract method algorithm	en_US
dc.type	Thesis	en_US
dc.contributor.department	Department of Computer Science and Engineering, Brac University
dc.description.degree	B.Sc. in Computer Science

Files in this item

Name:: 20241018, 20241020, 20301005, ...
Size:: 4.615Mb
Format:: PDF

View/Open

This item appears in the following Collection(s)

Thesis & Report, BSc (Computer Science and Engineering) [1483]

Show simple item record