Speeding up Mutation Testing via the Cloud: Lessons Learned for Further Optimizations

Toward mutants in the cloud

In academia, mutation testing has long been considered the best way to assess the effectiveness of test cases. Still, it is rarely applied in practice since it is computationally very expensive. But there is hope. In this paper, we evaluate a distributed mutation testing solution and report speed-ups between 12x and 12.7x for a setup with 16 nodes.

This is the first paper by Sten Vercammen, a great PhD student in Antwerp, Belgium… And all figures on this page belong to him. Since the beginning of the year I co-supervise Sten as part of the mutation testing research activity in the TESTOMAT project. The experimental work reported in this paper was done as part of his MSc thesis project – an impressive thesis in my eyes. The plan for the PhD project is to build on this paper to contribute to industrial adoption of mutation testing – by following the well-known recommendations for more efficient mutation testing: “Do fewer, Do Smarter, Do Faster”. Of course we also want to evaluate our work with partners in the TESTOMAT consortium.

Mutation testing? Testing mutants?

Mutation testing is not as widespread as researchers in academia appear to believe. It’s a rather odd name, but the idea behind it is quite simple. How do you know that your test cases are effective at finding bugs in your product code? Measuring code coverage is better than doing nothing at all, but several studies show that high coverage might not be sufficient, e.g., Inomzemtseva and Holmes (2014). What can we do instead? Enter mutation testing.

Mutation testing, first proposed in the 1970s, systematically injects faults into production code by applying various mutation operators, e.g., replacing + with – or < with >. The outcome of applying such an operator, i.e., a modified piece of software, is called a mutant. Then you run the test cases and count how many of the mutants are caught. If a test case triggers, the mutant is said to be killed.

DIMUTESTAS – a proof-of-concept cloud solution

Applying mutation operators on any program generates numerous mutants. And for each mutant, you need to execute all test cases to see whether the mutant is killed. Obviously, this is computationally very expensive! Previous work has demonstrated that mutation testing can be parallelized. In this work, we present a larger study on the possible speed-up when running mutation testing in a cloud solution with 16 workers running on 8 physical nodes.

DiMuTesTas architecture – process view.

The DiMuTesTas tool is a proof-of-concept implementation of mutation testing using a cloud solution. We use a Docker to deploy it on a physical system. Our hardware setup is eight Intel Core2 Quad Q9650 CPUs, each with 8 GB RAM each (the nodes) – used to run 1, 2, 4, 8, and 16 workers. RabbitMQ is used to manage the task queue and the message passing. The master performs an initial build and runs all test cases, and if they succeed, mutant generation and killing commences in a parallel fashion. There is also a file server to store mutants and results.

Results from the cloud

We study two industrial cases in this paper: 1) a small project from the logistics company Intris (the test suite takes only 7 s to run) and 2) a larger project from the e-health company HealthConnect (48 kLoC + 5 k lines of test code). The table below shows the execution times when running DiMuTesTas using up to 16 workers compared to the completely sequential mutation testing tool LittleDarwin – which also happens to be a great tool!

Execution times when parallelizing mutation testing with DiMuTesTas.

We notice that the execution time almost halves each time the number of workers doubles, thus the speed-up increases linearly. We also see that there is some overhead involved in the cloud solution – LittleDarwin is faster than DiMuTesTas with one individual worker. We conclude that our proof-of-concept achieves a speed-up between 12x and 12.7x on a cloud infrastructure with 16 nodes.

Detailed results

We also measured which parts of the parallel DiMuTesTas solution suffer from delays, and which parts indeed get linear speed-ups. This might be important information when trying to develop future solutions for mutation testing in the cloud. The table below shows the results for the following parts:

  • Setup delay: copying build dependencies to each worker.
  • Initial build: building the project on the master and running the test suite.
  • Mutant generation: generating mutants for each worker.
  • Mutant execution: executing test cases to kill mutants.
  • RabbitMQ scheduling delay: pulling and pushing tasks to the queue.
  • File server delay: copying data back and forth.
Execution time for the individual steps involved in mutation testing.

Our results show that the initial build delay is constant (as expected). Also the delay involved in the mutant execution is (almost) constant, but as presented in the figure: it is what really requires most of the execution time, i.e., this is the primary target for future optimization efforts. We observe that the setup delay grows linearly with the number of nodes rather than workers. Regarding the mutant generation delay, we actually notice a decrease when using more workers – this is likely an effect of workers being able to use the available memory better when the workload is smaller. Finally, the RabbitMQ delay and the file server/disk delays also grow linearly with the number of workers.

In conclusion, our lessons learned provide directions for future work on optimizing mutation testing. We believe that our proof-of-concept implementation of parallel mutation testing using DiMuTesTas paves the way for future cloud-based solutions – and this could really make mutation testing scale to fit into nightly build pipelines.

Implications for Research

  • Task independence is important – mutation testing optimizations that violate that will be less applicable.
  • Optimization efforts should primarily target mutant execution.
  • We propose implementing multicast to optimize setup-up delays.

Implications for Practice

  • Mutation testing is getting closer to wide-spread industrial adoption.
  • DiMuTesTas achieves speed-ups between 12x and 12.7x with 16 workers.
  • Speed-ups from parallelization allows mutation testing during the nightly build.
Sten Vercammen, Serge Demeyer, Markus Borg, and Sigrid Eldh. Speeding up Mutation Testing via the Cloud: Lessons Learned for Further Optimisations, In Proc. of the 12th International Symposium on Empirical Software Engineering and Measurement, 2018. (link, preprint)

Abstract

Background: Mutation testing is the state-of-the-art technique for assessing the fault detection capacity of a test suite. Unfortunately, it is seldom applied in practice because it is computationally expensive. We witnessed 48 hours of mutation testing time on a test suite comprising 272 unit tests and 5,258 lines of test code for testing a project with 48,873 lines of production code. Aims: Therefore, researchers are currently investigating cloud solutions, hoping to achieve sufficient speed-up to allow for a complete mutation test run during the nightly build. Method: In this paper we evaluate mutation testing in the cloud against two industrial projects. Results: With our proof-of-concept, we achieved a speed-up between 12x and 12.7x on a cloud infrastructure with 16 nodes. This allowed to reduce the aforementioned 48 hours of mutation testing time to 3.7 hours. Conclusions: We make a detailed analysis of the delays induced by the distributed architecture, point out avenues for further optimization and elaborate on the lessons learned for the mutation testing community. Most importantly, we learned that for optimal deployment in a cloud infrastructure, tasks should remain completely independent. Mutant optimization techniques that violate this principle will benefit less from deploying in the cloud.