Ah, autotests! But hey, who checks the output?
Automate all testing – save a lot of time! A common approach in contemporary software engineering. However, by automating test execution you postpone the manual effort needed – when you run thousands of tests several times a day you will inevitably get zillions of test reports to analyze. There is risk of auto test information overload! This paper goes beyond automated test execution by looking into automated analysis of test reports.
The software testing research front and what is actually done in industry are in many aspects decades apart. Testing researchers come up with the coolest techniques to generate test cases, promising great benefits to system verification – still a lot of testing means manually following test specifications. Click here, input this and that, toggle a checkbox, check output… One exception with a lot of interest both in academia and industry is test automation.
Of course test automation is not the solution to all V&V problems, but in general it is a good thing. Many companies have put a lot of effort into automation test suites. I have first-hand experiences from my time at ABB – early offshoring attempts at my organization was to put test automation projects in India. It didn’t always work out that well, but that’s another story. The point is that ABB wasn’t unique in trying to automate as many test cases as possible – the same strategy appears in many companies I’ve studied during my PhD studies.
More than test execution
When discussing test automation, it is common mistake to consider only test execution. Sure, this is an area where you can get rid of boring and repetitive manual clicking. Our perspective in this paper is that if you massively automate test execution, you will postpone manual work from running tests to the later analysis stage – what to do if you generate thousands and thousands of test reports daily? You might drown in that data, thus you should automate this step as well. If you have a complex branching strategy as in the case of Qlik, everything gets even more difficult – see figure below. Do the same test cases fail on multiple branches? How do these results compare to the test reports from yesterday?
NIOCAT – Navigating information overload
This project was done as a MSc. thesis project by two great students: Nicklas and Vanja. The case company, Qlik, provided the general problem description – they had a real need to improve their management of test results. Nicklas and Vanja explored the test environment at Qlik and studied information retrieval and machine learning approaches to deal with semi-structured textual data. The project resulted in the tool NIOCAT – Navigating Information Overload Caused by Automated Testing. NIOCAT clusters test case failures from arbitrary branches by calculating similarities between test case names and error messages, thus providing the test analysts a starting point for further investigation. The results are presented in Qlik’s QlikView product, offering state-of-the-art interaction with the output data.
We evaluated NIOCAT using a combination of quantitative and qualitative research. First, we developed a gold standard clustering, and evaluated how close the NIOCAT output could get – as the results were promising we invited some Qlik test analysts to a focus group meeting to gather in-depth feedback. We concluded that NIOCAT offers a novel overview of test results fromautomated testing – allowing a user to quickly discover for which branches a problem occurs and how many test runs failed because of a certain problem. We also show how the optimal parameter weighting of NIOCAT varies with different similarity thresholds – nicely visualized in triangle plots… One of the reviewers agreed! =)
Implications for Research
- A new danger of running large amounts automated testing is highlighted – information overload in the test results.
- Interaction with visual decision support data is fundamental according to previous research – we get this by showing results in QlikView.
- Parameter weighting for subsequent similarity levels can be visualized in triangle plots.
Implications for Practice
- Do not simply automate test execution without considering the consequences – developers must be able to interpret the output data.
- Information overload caused by automated testing decreases the automation ROI, but the problem can be mitigated by tools.
- To support later analysis, by either man or machine, make sure test cases fail with clever error messages.
Nicklas Erman, Vanja Tufvesson, Markus Borg, Per Runeson, and Anders Ardö. In Proc. of the 8th IEEE International Conference on Software Testing, Verification and Validation (ICST), pp. 1-9, 2015. (link, preprint)
Abstract
Background. Test automation is a widely used technique to increase the efficiency of software testing. However, executing more test cases increases the effort required to analyze test results. At Qlik, automated tests run nightly for up to 20 development branches, each containing thousands of test cases, resulting in information overload. Aim. We therefore develop a tool that supports the analysis of test results. Method. We create NIOCAT, a tool that clusters similar test case failures, to help the analyst identify underlying causes. To evaluate the tool, experiments on manually created subsets of failed test cases representing different use cases are conducted, and a focus group meeting is held with test analysts at Qlik. Results. The case study shows that NIOCAT creates accurate clusters, in line with analyses performed by human analysts. Further, the potential time-savings of our approach is confirmed by the participants in the focus group. Conclusions. NIOCAT provides a feasible complement to current automated testing practices at Qlik by reducing information overload.