PhD Thesis - Markus Borg

Full-text PDF is available here for download!

In large software development projects the sheer volume of incoming issue reports can be daunting. However, what if we could let the huge numbers work for us? In this thesis, we discuss how to use machine learning to find patterns in the issue inflow, and to tame bugs to become actionable decision support.

From Bugs to Decision Support – Leveraging Historical Issue Reports in Software Evolution

measureLeaveCave — Lic. thesis: Do not evaluate IR in the cave ONLY.

The thesis contains major contributions to understand state-of-practice and state-of-the-art in managing large amounts of information in software engineering, through an industrial case study and a systematic literature review, respectively. This first part of the PhD thesis is much based on my licentiate thesis, in which I focused on investigating evaluations of applied information retrieval (IR) solutions in software engineering – also worth a look!

bugsFromCave — PhD thesis – Tame the bugs and evaluate in front of the cave.

Also, we develop and evaluate solutions based on our ideas. We show that ensemble-based machine learning is feasible to automate issue assignment in five project contexts in industry. Also, we develop a recommendation system for change impact analysis that lets you follow in the footsteps of previous engineers in the evolution of a safety-critical system. The recommendation system is evaluated in two development teams in industry, one in Sweden and one in India. Finally, we present an experiment framework to tune, i.e., find a good parameter setting, to complex software engineering tool support such as the ones we developed in this thesis.

The full-text PDF is available here for download. I still have plenty of physical copies, just ask for one and I’ll mail it to you!

lever — Leverage on the large amounts of bugs – they can be useful!

Formalities

I defended my PhD thesis at Dept. of Computer Science, Lund University on May 8, 2015. Prof. Serge Demeyer a software evolution researcher from University of Antwerp, was the faculty opponent and the examination committee consisted of Prof. Birger Larsen, Aalborg University, Prof. Rickard Torkar, Chalmers and the University of Gothenburg, and Dr. Piotr Tomaszewski from Intel Corporation. Thus the committee represented the perspectives i) information retrieval, ii) empirical software engineering, and iii) industrial applicability, respectively.

Supervisors: Prof. Per Runeson and Prof. Björn Regnell.

Markus Borg, From Bugs to Decision Support – Leveraging Historical Issue Reports in Software Evolution, Lund University, 2015. (open access)

Abstract

Software developers in large projects work in complex information landscapes and staying on top of all relevant software artifacts is an acknowledged challenge. As software systems often evolve over many years, a large number of issue reports is typically managed during the lifetime of a system, representing the units of work needed for its improvement, e.g., defects to fix, requested features, or missing documentation. Efficient management of incoming issue reports requires the successful navigation of the information landscape of a project.

In this thesis, we address two tasks involved in issue management: Issue Assignment (IA) and Change Impact Analysis (CIA). IA is the early task of allocating an issue report to a development team, and CIA is the subsequent activity of identifying how source code changes affect the existing software artifacts. While IA is fundamental in all large software projects, CIA is particularly important to safety-critical development.

Our solution approach, grounded on surveys of industry practice as well as scientific literature, is to support navigation by combining information retrieval and machine learning into Recommendation Systems for Software Engineering (RSSE). While the sheer number of incoming issue reports might challenge the overview of a human developer, our techniques instead benefit from the availability of ever-growing training data. We leverage the volume of issue reports to develop accurate decision support for software evolution.

We evaluate our proposals both by deploying an RSSE in two development teams, and by simulation scenarios, i.e., we assess the correctness of the RSSEs' output when replaying the historical inflow of issue reports. In total, more than 60,000 historical issue reports are involved in our studies, originating from the evolution of five proprietary systems for two companies. Our results show that RSSEs for both IA and CIA can help developers navigate large software projects, in terms of locating development teams and software artifacts. Finally, we discuss how to support the transfer of our results to industry, focusing on addressing the context dependency of our tool support by systematically tuning parameters to a specific operational setting.