To be blogged…
Markus Borg, Per Runeson, and Lina Brodén. In Proc. of the 16th International Conference on Evaluation & Assessment in Software Engineering, pp. 111-120, Ciudad Real, Spain, 2012. (link, preprint)
Background: Development of complex, software intensive systems generates large amounts of information. Several researchers have developed tools implementing information retrieval (IR) approaches to suggest traceability links among artifacts. Aim: We explore the consequences of the fact that a majority of the evaluations of such tools have been focused on benchmarking of mere tool output. Method: To illustrate this issue, we have adapted a framework of general IR evaluations to a context taxonomy specifically for IR-based traceability recovery. Furthermore, we evaluate a previously proposed experimental framework by conducting a study using two publicly available tools on two datasets originating from development of embedded software systems. Results: Our study shows that even though both datasets contain software artifacts from embedded development, the characteristics of the two datasets differ considerably, and consequently the traceability outcomes. Conclusions: To enable replications and secondary studies, we suggest that datasets should be thoroughly characterized in future studies on traceability recovery, especially when they can not be disclosed. Also, while we conclude that the experimental framework provides useful support, we argue that our proposed context taxonomy is a useful complement. Finally, we discuss how empirical evidence of the feasibility of IR-based traceability recovery can be strengthened in future research.