Machine learning is data-hungry. Really hungry. Often detailed bug information is needed to train models. Unfortunately, bug trackers tend to record when bugs were fixed but not when they were committed. The default solution to tackle this in software engineering research is the SZZ algorithm. We present SZZ Unleashed, a new implementation available on GitHub.
This paper originates in a highly successful MSc thesis project conducted by Oscar Svensson and Kristian Storm at Axis Communications in Lund. It started when Sven Selberg from Axis wanted his summer intern Oscar to explore what to do with bugs that are not caught by the automated testing. Could machine learning be used to detect those commits that deserve extra code review? We started to explore what the research community refers to as just-in-time bug prediction.
Machine learning wants bug data
Machine learning (ML) is certainly a hot topic in software engineering (SE), just like everywhere else. In SE, it is far from a recent topic – it has been used in numerous studies. We have also used it in our research before the current deep learning hype, e.g., with Ericsson to automate bug assignment. Making bug predictions has been a popular topic in the community. To do supervised learning, however, you need to train your models on plenty of bug data.
Machine learning projects typically come down to finding a large set of reliable training data. Our project with Axis was no exception. To train an ML model on buggy commits, we needed plenty of commits that have been annotated as buggy. But where do you get that? The bug tracker records when commits fix bugs, but when were they introduced? ML in SE often wants to train on the bug-introducing commits.
The SZZ algorithm in a nutshell
The go-to method for SE researchers when bug-intrpoducing commits are missing is the SZZ algorithm by Sliwerski, Zimmernamm, and Zeller. SZZ has been refined somewhat, but the original idea remains – a heuristic approach to deduce which commits introduced bugs that later were fixed. SZZ involves a two step process. First, bug-fixes are found in the issue tracker – typically an easy task that can be solved by regular expressions matching a project’s commit message conventions. Second, git blame (or the svn counterpart back in the days) is used to identify a set of possibly bug-introducing commits. Then this list of suspects is pruned until there is an end result. We realized that this was needed for the thesis project at Axis.
The figure below shows a simple example of the SZZ algorithm in action. First, commit 3 is identified as the fix for Bug A. The command git blame is used to find where lines 0 and 1 were created (or modified) – in this case commits 1 and 2. Since bug A was reported before commit 2, that commit is pruned. Commit 1 remains and it is regarded as bug-introducing. Addirional exampoles are described in the paper.
SZZ Unleashed
While SZZ has been used in many papers, a recent review article by Rodriguez-Perez reported that few SZZ implementations are publicly available. For the thesis project, we concluded that we had to implement our own tool. Unfortunately, if all researchers that need SZZ must do this, there is a considerable waste of engineering resources –
SZZ Unleashed has a core implemented in Java and supporting scripts in Python. JSON files are created in intermediate steps as shown in the figure below. There is some infrastructure needed for SZZ to work – we have provided detailed instructions in the GitHub ReadMe. There is also a Docker container prepared to help you get started. On August 27, 2019, the SZZ Unleashed GitHub repo had 21 forks and 33 stars.
Bug prediction at Axis
What happened to the bug prediction at AXIS then? While the largest part of the thesis project revolved around SZZ Unleashed, we had some time to train a random forest to do binary classification for commits to the Jenkins repository – a project to which Axis regularly contributes. Note that the Jenkins dataset extracted using SZZ Unleashed is unbalanced, i.e., roughly 4% of the commits are bug-introducing. Doing binary classification for this data means we need to tackle the class imbalance problem. Using oversampling with SMOTE and training a random forest classifier resulted in 12% precision and 21% recall for 10-fold cross-validation – probably not accurate enough to be useful… but we still consider it a proof-of-concept.
We also tried running a time-sensitive evaluation, i.e., making sure we always trained on data from the past – often critical as we reported before. By carefully treating the timestamps, the recall was roughly cut in half. Our results confirm previous findings by Tan et al. (2015): disregarding time information might give overly positive results. Tan et al. showed this for precision, but we now show it might happen also for recall.
While we don’t know if our bug prediction results are good enough to be useful, we decided to report the relative importance of the features we used to represent commits – it might be interesting for someone in the future. The figure below shows what mattered when training our classifier. The results are in line with previous research. Churn is a strong predictor of bugs, together with the size of both the change and the target, as well as the number of people who previously made changes. We think all these features make sense as they reflect the complexity of the code.
Implications for research
- SZZ should be a piece of commodity software. Researchers can now use SZZ Unleashed instead of reimplementing the algorithm.
- We confirm that bug prediction evaluations should be time-sensitive – both precision and recall can obtain overly positive results.
- The research community is welcome to contribute to the
GitHub repository .
Implications for practice
- The SZZ algorithm can be used to characterize bug-introducing commits in the repository.
- Training a classifier can be useful to highlight when a particular commit needs careful code review.
- For the Jenkins project, the most important features when predicting bugs in commits are related to churn, size, and the number of people involved.
Markus Borg, Oscar Svensson, Kristian Berg, and Daniel Hansson. SZZ Unleashed: An Open Implementation of the SZZ Algorithm - Featuring Example Usage in a Study of Just-in-Time Bug Prediction for the Jenkins Project. In Proc. of the Workshop on Machine Learning Techniques for Software Quality Evolution (MaLTeSQuE), pp. 7-12, 2019. (link, preprint, code, slides)
Abstract
Numerous empirical software engineering studies rely on detailed information about bugs. While issue trackers often contain information about when bugs were fixed, details about when they were introduced to the system are often absent. As a remedy, researchers often rely on the SZZ algorithm as a heuristic approach to identify bug-introducing software changes. Unfortunately, as reported in a recent systematic literature review, few researchers have made their SZZ implementations publicly available. Consequently, there is a risk that research effort is wasted as new projects based on SZZ output need to initially reimplement the approach. Furthermore, there is a risk that newly developed (closed source) SZZ implementations have not been properly tested, thus conducting research based on their output might introduce threats to validity. We present SZZ Unleashed, an open implementation of the SZZ algorithm for git repositories. This paper describes our implementation along with a usage example for the Jenkins project, and conclude with an illustrative study on just-in-time bug prediction. We hope to continue evolving SZZ Unleashed on GitHub, and warmly invite the community to contribute.