Skip to main content
Thesis defences

PhD Oral Exam - Mohammed Shehab, Electrical and Computer Engineering

Techniques to Enhance Just-In-Time Software Defect Prediction Models


Date & time
Thursday, March 28, 2024
9 a.m. – 12 p.m.
Cost

This event is free

Organization

School of Graduate Studies

Contact

Nadeem Butt

Where

Engineering, Computer Science and Visual Arts Integrated Complex
1515 St. Catherine W.
Room 005.251

Wheel chair accessible

Yes

When studying for a doctoral degree (PhD), candidates submit a thesis that provides a critical review of the current state of knowledge of the thesis subject as well as the student’s own contributions to the subject. The distinguishing criterion of doctoral graduate research is a significant and original contribution to knowledge.

Once accepted, the candidate presents the thesis orally. This oral exam is open to the public.

Abstract

Software defects can lead to significant consequences, adversely affecting system per- formance by resulting in critical failures. The objective of Just-In-Time Software Defect Prediction (JIT-SDP) techniques is to identify potential defects at an early stage of de- velopment, thereby enhancing the reliability and maintainability of software. This thesis contributes novel advancements to JIT-SDP, specifically addressing project clusters, data imbalance, and classifier combination challenges. Additionally, all contributions are evalu- ated using diverse software projects and 34 datasets, encompassing a total of 259k commits.

The first contribution introduces ClusterCommit, a JIT-SDP approach tailored for project clusters sharing libraries and functionalities. Unlike traditional methods, ClusterCommit employs a machine learning model trained on commits from various projects within a clus- ter. The study incorporates six machine learning and three deep learning models. The results reveal noteworthy improvements, with mean Area Under the Curve (AUC) values ranging from 4% to 12%, particularly prominent in complex models such as Random For- est (RF) and Support Vector Machine (SVM) when dealing with large clusters. In contrast, simpler models like Naive Bayes (NB), Logistic Regression (LR), Decision Tree (DT), and k-Nearest Neighbors (k-NN) do not perform as well when applied to clusters of projects.

Back to top

© Concordia University