(Blogged by Saket S.)
The scope of the software system has shown consistent growth not just in terms of lines of code, but also in augmenting factors like complexity, decentralization, and degree of heterogeneity. In Requirements Engineering, these augmenting factors are the “hot spots” topics of research.
Several studies have emphasized the need for revisiting the existing software and requirements engineering methods. This is mostly due to the pressure from increased size and subsequent complexities arising out of the same. Apart from this, in requirement engineering and management, one of the most important challenges in the development of a large as well as complex system is providing sufficient automatic support to all the activities that are not only hectic but also consumes a lot of time.
In a study, the approach of machine learning was used to trace the regulatory codes to particular product requirements. In the study, one of the methods successfully identified 1,806 trace links out of the total of 1,889 links. Still, 83 trace links were entirely left upon the analysts to manually find.
Following are the key activities for requirements traceability:
- Establishing links of traceability between artifacts, i.e., trace capture.
- Maintaining links of traceability between artifacts, i.e., trace recovery.
This is the reason why semi-automatic methods help only in reducing the search space and don’t help in decision-making. It is crucial to develop measures that go beyond recall and precision. Unless and until one has fine measures on which the evaluations can be based upon, the idea of improving traceability tools will become harder.
Software engineering is a tedious exercise and aligning the involved processes is one of the critical challenges that need to be addressed effectively and efficiently. In order to improve the alignment, it is advisable to make effective use of traceability tools. These traceability tools can be utilized to create as well as to maintain trace links between the software artifacts. A lot of the available traceability tools are semi-automatic. These semi-automatic traceability tools present a list of candidate trace links and further requires a human analyst to take the final call and make decisions. It is up to the human analyst to decide whether or not to establish the link. Some of the most widely accepted methods used for the evaluation of traceability techniques usually fail to measure the support for this decision. It is not covered by the measures of precision and recall.
Precision shows exactness and recall depicts completeness. Precision and recall are inversely proportional. The amount of effort that gets in increasing the recall method results directly in decreasing the precision and vice-versa. Thus, it is important to always combine precision and recall into one single measure and should never be taken in isolation. One such example is that of F-measure. This is nothing but a weighted harmonic mean of precision and recall.
To overcome the challenge and to semi-automate the tracing requirements people have proposed many methods. Three IR-rooted (information retrieval) methods dominate tool support for semi-automatic requirements traceability – Latent Semantic Analysis (LSA), probabilistic approaches, and Vector Space Model. Although the methods show good results when it comes to recall and precision, they have the following drawbacks:
- Requires data pre-processing and that too needs to be done manually.
- Performance depends directly on the input and quality of the data.
- Analysts are left with a link with the candidates’ name that needs to be investigated manually.
Signal-to-Noise Ratios (SNR) to support requirements traceability
SNR is a complementary quality measure that can bring qualitative insights into tested methods. It can be applied to new traceability methods to evaluate how good the methods are at distinguishing correct links in the search space.
SNR is used widely in the field of science and engineering. The main purpose of SNR is to quantify the amount of signal corrupted by the noise. To do so, it compares the desired signal level to the background noise level. The ratio between the average power of the signal (Psignal) and the average power of the noise (Pnoise) is the classical definition of the SNR. Signals have a wide range of dynamism which is why SNR is expressed using the logarithmic decibel scale.
An alternate definition of SNR says it is the ratio of the mean (mu) to the standard deviation (sigma) of measurement or signal. This alternate definition of SNR finds specific usage in image processing. The SNR of an image can be calculated as the value of mean pixel upon values of the standard deviation of the pixel in a controlled environment. Ratio and noise are indirectly proportional, that is, the higher the ratio, the less noticeable background noise will be. In simpler terms, SNR is the ratio of meaningful information to the meaningless background noise.
In the processing of the digital signal, noise is simply the error signal caused by signal quantization, with the assumption that conversion of analog-digital has already been performed.
Software engineering is challenging! But the bigger task is to keep the processes involved in the production of software aligned. One approach to tackle the challenge is traceability. Although many semi-automated traceability tools are available, but in the end, they require human analysts to make decisions. Signal-to-Noise Ratios (SNR) is a family of measures that can be used to estimate how easy it is for humans to vet a list of candidate trace links. After applying SNR to an entirely new sets of requirements and evaluating three methods, the results obtained show a very small difference between signal and noise. This shows why analysts find trace link vetting difficult. We propose that measures should help improve the techniques that are semi-automatic and lay a strong foundation for automated tools.
K. Wnuk, M. Borg, and T. Gorschek. Towards New Ways of Evaluating Methods of Supporting Requirements Management and Traceability using Signal-to-Noise Ratio. In Proc. of the 14th International Conference on Evaluation of Novel Approaches to Software Engineering, 2019. (link, preprint)
Developing contemporary software solutions requires many processes and people working in synergy to achieve a common goal. Any misalignment between parts of the software production cycle can severely impede the quality of the development process and its resulting products. In this paper, we focus on improving means for measuring the quality of methods used to support finding similarities between software product artifacts, especially requirements. We propose a new set of measures, Signal-to-Noise ratios which extends the commonly used precision and recall measures. We test the applicability of all three types of SNR on two methods for finding similar requirements: the normalized compression distance (NCD) originating from the domain of information theory, and the Vector Space Model originating from computer linguistics. The results obtained present an interesting property of all types of SNR, all the values are centered around 1 which confirms our hypothesis that the analyzed methods can only limit the search space for the analysis. The analyst may still have difficulties in manually assessing the correct links among the incorrect ones.