Linked bugs and the power of networks
Bugs reports are not independent pieces of text in a database. It is quite common to highlight relations to related bugs somehow, but the connection is mainly used to enable quick browsing. Extract all such links and construct bug networks – it can greatly support information retrieval and recommendation systems.
This is a paper with a rather interesting history. First, I did network analysis of bug reports in the Android development project as part of the MSR challenge 2012. My challenge report was not accepted for publication, but I was quite sure there was something interesting in the submission. I complemented the work an analysis of bug networks in a proprietary bug tracker and submitted it to CSMR’13 – where it was accepted as a full paper. Actually, this is a paper I like very much, it has been a cornerstone in my PhD work, but unfortunately it has hardly been cited.
Hello networks, are you there?
The first research question was truly exploratory: what kind of bug networks emerge when we do link mining? For the Android dataset, we extracted a (directed) link from a bug to another if the bug’s comments (i.e., the developers’ discussion) contained a reference to another bug. For the proprietary dataset, we simply relied on information in a specific “related issues” field available. We definitely discovered networks of issues, examples from Android presented below.
Once you’ve discovered networked structures, there is a whole bunch of standard network measures that can be calculated – many of them have an interpretation in social network analysis, e.g., how tight a group of people are, who is the authority, and clique analysis. For bug networks, such interpretations are not yet available. It should be possible to find similar patterns in bug networks, but more studies reporting descriptive statistics are needed. The table below shows measures for the proprietary dataset (to the left) and Android.
Our next step toward understanding the networks involved qualitative research. We selected six clusters of bugs looking quite different, the one denoted B is visualized below. Then we studied the links and the bug reports in detail to see what “caused” them. We found that certain phenomena appear to create very characteristic network structures. For example, copy/paste comments used to highlight duplicated reports in multiple Android bugs create “bug stars”, i.e., one central bug and several “orbiting” nodes. Note that we distinguish between cloned, duplicated, and related bug reports in this study. Related bugs have some kind of relation, duplicated bugs report the same issue, and cloned bugs contain exactly the same bug description. Clones are thus a special type of duplicates – and they are not that uncommon in the Android dataset. The table below summarizes our qualitative analysis.
Unleashing the power of networks
So what is the point of all this network analysis? Well, networks are everywhere – and networks are powerful. Many advances in computer science rely on relations. Google’s PageRank to help search engines is probably the most famous example, but there are many other applications: road networks in navigation systems, rating networks in recommendation systems, graph coloring algorithms in compilers, etc. In this paper we argue that bug networks can support navigation of large software development projects – the figure below shows an example of change impact analysis as part of bug resolution, i.e., tracing artifacts on the requirements and testing sides of the V-model. Direct tracing from a bug means traditional information seeking in the information space. Indirect tracing on the other hand means first finding similar bugs that have already been resolved – and then investigating what was impacted when fixing them. This is a fundamental concept in the implementation of ImpRec – a recommendation system for change impact analysis.
Implications for research
- Bug trackers contain considerable bug networks that can be used for tool support – for example supporting navigation of large information spaces.
- Network centrality measures can be used to identify particularly central bugs – interesting both to plan ongoing maintenance work and as input to retrospective meetings.
- Links between bugs in a bug tracker represent different levels of “relatedness”, including pointers to bug reports describing the same behavior (duplicates) and identical bug reports (clones).
Implications for practice
- Relations can be valuable – make sure that developers really store links in the bug tracker, incl. related issues, duplicates, and clones.
Markus Borg, Dietmar Pfahl, and Per Runeson. In Proc. of the 17th European Conference on Software Maintenance and Reengineering, pp. 79-88, 2013. (link, preprint, data)
Completely analyzed and closed issue reports in software development projects, particularly in the development of safety-critical systems, often carry important information about issue-related change locations. These locations may be in the source code, as well as traces to test cases affected by the issue, and related design and requirements documents. In order to help developers analyze new issues, knowledge about issue clones and duplicates, as well as other relations between the new issue and existing issue reports would be useful. This paper analyses, in an exploratory study, issue reports contained in two Issue Management Systems (IMS) containing approximately 20.000 issue reports. The purpose of the analysis is to gain a better understanding of relationships between issue reports in IMSs. We found that link-mining explicit references can reveal complex networks of issue reports. Furthermore, we found that textual similarity analysis might have the potential to complement the explicitly signaled links by recommending additional relations. In line with work in other fields, links between software artifacts have a potential to improve search and navigation in large software engineering projects.