TuneR: A Framework for Tuning Software Engineering Tools with Hands-on Instructions in R

JSEP16_TuneR — Overview of the TuneR framework – Three phases toward a setting.

Want more than the default setting? Experiment!

Numerous highly configurable tools have been developed in software engineering research. Unfortunately, understanding the parameters involved in complex tools is a real challenge. We present TuneR – an experiment framework that helps tuning software engineering tools using empirical methods.

So, you’ve just installed this great new tool. It runs out of the box without any fancy installation process, nice! But what about all these settings? Sure, the default values results in some output, but how much better could it be if you tune it for your context? And how would you even start to do it? TuneR is for you.

Loads of tools proposed in software engineering research implement really complex ideas. I’ve downloaded a few myself, and understanding how to configure the parameters can be quite a challenge. Far too often you simply keeps all default values, even though another setting would have been a real improvement! If, on the other hand, a new user does try to improve the setting, we argue that three strategies dominate: 1) ad hoc tuning (non-systematic, requires some luck), 2) quasi-exhaustive search (try a lot), and 3) change one parameter at a time (does not consider interaction between parameters). But this can be done a lot better with proper design of experiments!

Meet TuNer and proper experimentation

TuneR, a three phase framework, provides the structure to run such experiments. It builds on established approached from experimentation in the physical world, but adapts it to the virtual nature of software tools. First, the preparation phase covers collecting data for tuning experiments, choosing the response metric (the goal of the tuning), and identification of parameters. Without proper preparations, there can be no successful tuning. Second, screening is done to find which parameters are the most important ones. Third, response surface methodology means modeling the response in a simple way, and then iteratively changing the setting in the most promising direction. Once close to an optimum, a slightly more complex model is used to pinpoint the best setting.

JSEP16_cont — Iteratively changing the setting to maximize the response, i.e., along the path of steepest ascent.

In this paper, we walk the reader through all TuneR steps by showing how we tuned our tool ImpRec. We present step-by-step instructions on how to do this using various R packages. All data required to follow along are of course published on a companion website. To get a good introduction to design of experiments, I really recommend Kevin Dunn’s MOOC Experimentation for Improvement available from Coursera – that course is what inspired me to write this paper.

Not a tutorial paper… or is it?

My first plan for this paper was to submit it to the very tool-oriented journal Automated Software Engineering (ASE). Seemed fitting as a lot of complex tools are presented in that journal. Furthermore, I found that they explicitly call for submission of the category tutorial paper. To get some ideas of what an ASE tutorial paper looks like, I searched for tutorial papers but got no hits. Browsed a few earlier issues, but still no tutorials. I asked the editor-in-chief for examples of earlier tutorial papers. But was told that no tutorial papers had ever been published in ASE, since “it is difficult for them to meet the novelty standard maintained”. Kind of strange to call for tutorials then, right? I was actually discouraged to submit my work. I did my best to stress novelty in the manuscript, and submitted as a tutorial paper anyway… And the paper was rejected with a single review a few weeks later.

Instead I submitted the manuscript to Journal of Software: Evolution and Process that doesn’t call for tutorial papers. I still explained in the cover letter that the submission was “written in a tutorial-style” – and the appointed editor and reviewers were quite happy with the paper. The whole experience makes me wonder about the value of tutorial papers. Should I have referred to it as a “guideline” instead? I think papers describing complex approaches might benefit from detailed step-by-step instructions, it makes them both understandable and repeatable. But now all is well and the paper is published! =)

Implications for Research

A framework that complements existing guidelines for conducting software engineering experiments.
A discussion on how tuning a software engineering tool differs from experiments in both the physical world and traditional computer experiments.
A successful example of using response surface methodology in software engineering.

Implications for Practice

If a software engineering tool does not provide feasible results using the default setting – try some well-designed experiments.
Attempt tuning in a systematic way, and do not change one parameter at a time.
TuneR is accompanied by detailed instructions of how to analyze experimental results using R, a free software environment.

Markus Borg. TuneR: A Framework for Tuning Software Engineering Tools with Hands-on Instructions in R, Journal of Software: Evolution and Process, 28(6), pp. 427-459, 2016. (link, preprint, data)

Abstract

Numerous tools automating various aspects of software engineering have been developed, and many of the tools are highly configurable through parameters. Understanding the parameters of advanced tools often requires deep understanding of complex algorithms. Unfortunately, suboptimal parameter settings limit the performance of tools and hinder industrial adaptation, but still few studies address the challenge of tuning software engineering tools. We present TuneR, an experiment framework that supports finding feasible parameter settings using empirical methods. The framework is accompanied by practical guidelines of how to use R to analyze the experimental outcome. As a proof-of-concept, we apply TuneR to tune ImpRec, a recommendation system for change impact analysis in a software system that has evolved for more than two decades. Compared with the output from the default setting, we report a 20.9% improvement in the response variable reflecting recommendation accuracy. Moreover, TuneR reveals insights into the interaction among parameters, as well as nonlinear effects. TuneR is easy to use, thus the framework has potential to support tuning of software engineering tools in both academia and industry.