Goal-Oriented Mutation Testing with Focal Methods

Focused mutant killing

Mutation testing takes too much time? Sure, if you run all test cases for each mutant. Why not focus better? Run the test cases that actually test the code where you introduced the mutant – use focal methods to enable goal-oriented mutation testing.

This is the second paper by the PhD student Sten Vercammen in Antwerp, and I’m happy to have been involved. Sten’s research is all about enabling industrial adoption of mutation testing by pursuing the vision “do fewer, do smarter, and do faster“. The idea we explored in this paper, thanks to Serge Demeyer’s good connections with researchers in Switzerland, was to use focal methods (the backbone of a test scenario inside a unit test case) to limit the scope of mutation testing. In our initial evaluation, this works really well in practice – and we now refer to it as “goal-oriented mutation testing”.

From Focal methods to goal-oriented mutations

We use focal methods to establish fine-grained trace links (i.e., traceability links) between production code and test code. The trace links shows what parts of the production code that individual test cases actually test. This can of course be used to prioritize traditional unit test execution, but in this study we were interested in targeted mutation testing – potentially reducing the scope of mutation testing radically.

Mohammad Ghafari, the second author, has developed an approach to automatically identify focal methods. This is very important, since distinguishing focal methods from other methods is hard to accomplish manually. Mohammad’s approach uses data flow analysis to capture information in test cases and source code, identifies system dependencies and constructs a partial system call graph – the output is focal methods for each test case.

In this paper, we propose to use focal methods to direct mutation testing. By only executing the test cases that actually are meant to test the method f, i.e., the focal method, we argue that we can reduce the number of test cases needed to kill mutants located in method f. This means a trade-off between reducing the execution time of mutation testing and its fault revealing power – but we believe the pros outweigh the cons.

Proof-of-concept using the Apache Ant project

We selected the Apache Ant project for our proof-of-concept. It’s a mature project (first commit in Jan 2000), consisting of more than 200 kLoC and roughly 1,800 test cases. We generated all possible mutants with the tool Little Darwin and selected 423 of them for further analysis. We created an execution time benchmark by running the complete test suite for the 423 mutants residing in 4 classes. Then we did a manual (to remove tools as an error source) focal method analysis to identify which test cases should probably be executed to kill each mutant.

The table below shows our results from goal-oriented mutation testing compared to the benchmark. For each class under study, the rows show three techniques: 1) full test suite is our benchmark, 2) class based is a mutation strategy for which only the test cases targeting the corresponding class are executed, and 3) focal methods is our proposed goal-oriented mutation testing.

The false negatives show surviving mutants due to the limited number of test cases considered by the used techniques, but that would have been killed by the full test suite. AVG tests considered indicates how many test cases the technique would execute (on average) to kill the mutants. Run time indicates the time needed to execute all mutants (either until the mutant is killed or all considered test cases have been executed) and finally, speed-up indicates how much faster the technique is compared to running the complete
test suite.
Results from goal-oriented mutation testing on Apache Ant.

Our results show that running goal-oriented mutation testing killed 55 out of 423 mutants (13%). This is a low number, and we believe the reason is that the focal methods approach at the moment only to a limited extent is applicable to private methods in the production code. The focal methods work only if there are explicit test cases – but this is often not the case for private methods, as they tend to be tested when called from public methods. Thus, we missed most mutants in private methods… this should be explored further in the future.

Regarding the run time, the results confirm that it is drastically reduced: from 4,365 seconds to 7.6 seconds – corresponding to a speed-up of 570x. This is a huge difference, but is it reasonable to decrease the ratio of killed mutants this much? Probably not yet, but when private methods are also covered by focal methods – yes, we think so!

Implications for Research

  • Focal methods appears to be a promising approach to reduce the scope of mutation testing.
  • Focal methods need to evolve to better cope with private methods – to kill more than 13% of the mutants.
  • A larger study on goal-oriented mutation testing would be a welcome research contribution.

Implications for Practice

  • Our example on using focal methods obtained a mutation testing speed-up of more than 500x on Apache Ant.
  • Running fewer test cases while retaining a reasonable fault revealing power would be a great step toward running mutation testing in the continuous integration pipeline.
Sten Vercammen, Mohammad Ghafari, Serge Demeyer, and Markus Borg. Goal-Oriented Mutation Testing with Focal Methods, In Proc. of the 9th ACM SIGSOFT International Workshop on Automating TEST Case Design, Selection, and Evaluation, 2018. (link, preprint)

Abstract

Mutation testing is the state-of-the-art technique for assessing the fault-detection capacity of a test suite. Unfortunately, mutation testing consumes enormous computing resources because it runs the whole test suite for each and every injected mutant. In this paper we explore fine-grained traceability links at method level (named focal methods), to reduce the execution time of mutation testing and to verify the quality of the test cases for each individual method, instead of the usually verified overall test suite quality. Validation of our approach on the open source Apache Ant project shows a speed-up of 573.5 x for the mutants located in focal methods with a quality score of 80%.