Can RE Help to Better Prepare Industrial AI for the Commercial Scale?

This is a personal copy of a column in IEEE Software (Nov/Dec 2022). Republished with permission.

With this issue, I start my term as department editor for the Requirements column. I am very much looking forward to exploring contemporary aspects of requirements and RE in the next years! As an institute researcher with RISE, I primarily work in strictly regulated domains in which requirements are cornerstones in the development activities. Please see my introduction in the previous IEEE Software issue to learn my background. In this issue – featuring a theme that matches my current research interests perfectly – we share our thoughts on RE4AI from the perspective of Siemens Digital Industries. Referring to this as Industrial AI, we share insights from our numerous chats about this topic in the last two years, including formal interviews with key stakeholders. In this column, we argue that the business side of AI has been underexplored – and that RE can help us forward.
Markus Borg, Requirements Department Editor

Boris Scharinger, Siemens Digital Industries, Fürth, Germany
Markus Borg, RISE Research Institutes of Sweden AB, Lund, Sweden
Andreas Vogelsang, University of Cologne, Cologne, Germany
Thomas Olsson, RISE Research Institutes of Sweden AB, Lund, Sweden

“Hey AI, show me the money!” Sure, the impact of AI on industries and society is huge. However, we are still waiting for that steady stream of lucrative success stories in Industrial AI.

Despite increasing AI maturity, industrial AI longs for good recipes to achieve positive business cases. The AI adoption in manufacturing and process industries has been fairly low [5]. Just like start-ups report in the AI Venture Capital market [6], we notice that Death by Proof-of-Concepts is a widely occurring phenomenon.

Many argue that Machine Learning (ML) is only yet another piece of technology. That is, ML extends the software and systems engineering toolbox. We all just need to learn how to use it properly. However, the dependency and interaction between data and ML-based applications involve novel challenges.

How to best master the interplay between training data, artificial neural network architectures, and the resulting applications remains unclear. We still have to tackle the specifics of how to align and adapt the software development methods to the ML characteristics. We see RE as a main driver in this maturation.

The Commercial Challenge for Industrial AI

The combination of AI’s massive impact vs. mastering the art challenge imposes the opportunity for a competitive edge. Organizations going through the learning curve faster will benefit disproportionally. A McKinsey Global Institute report foresees AI frontrunners doubling their cash flow by 2030 while laggards are likely to experience a cash flow decline of around 20% in this period [1].

While some companies’ business models rely on a novel ML solution, most companies rather employ ML to improve something existing. Perhaps the data-driven approach optimizes a process. Or ML allows automation of a part of an existing process.

During the last three to four years, most AI roadmaps were all about recruiting and building up a data science team. Then, entering the technology learning curve (exploring), and finding the first useful correlations in data (experimenting). Any organization would spend $500k of project efforts for a use case that ultimately delivers a productivity gain of $75k. In other words: “allocate your hours to the research cost center!”

A Macro-Perspective on RE

Two major movements are currently driving momentum in increasing ML maturity.

First, a societal movement for responsible and trustworthy ML. People have been made increasingly aware that data-driven algorithms impact everyday lives by admonishing authors such as Cathy O’Neil [4]. Consequently, the European Union’s proposed AI Act is a response by lawmakers to address the “trustworthy AI” challenge.

Second, the need to crack the tantalizing nut of autonomous driving. After the first phase of R&D to prove the feasibility of self-driving cars, automotive and technology companies are now – forced by burning safety and liability questions – working on making ML repeatable, provable, and verifiable.

The RE community has been relatively late to the party – only recently the first research studies targeting ML systems appeared. On the other hand, data scientists and ML engineers have not been too interested in the existing RE body of knowledge.

At Siemens and elsewhere, we find that ML projects rarely follow the structured RE processes usually followed by non-ML projects. However, we found no particular reasons that would disqualify the use of established RE processes [2]. Instead, many projects appear to still be “playing around” with ML solutions because of high uncertainty about their capabilities. Also, there has so far been little pressure from production environments to professionalize ML solutions.

The culture of the data science teams we observed is heavy on the Dev side and very light on the Ops side. As reported by Kim et al. [3], the development side seeks the challenge and is change-driven. On the contrary, the Operations side seeks stability and is adverse to change. We recognize that most data science curricula contain minimal engineering aspects such as RE. Most junior data scientists at Siemens report roughly an equal split between computer science and statistics

ML Project Pitfalls and Potential RE Remedies

Beyond supporting Industrial AI on the macro level, we see opportunities for RE to contribute to six specific ML project pitfalls.

(1) Costs of False Positives vs. False Negatives: Consider the example of ML for vision-based quality inspection. The costs of false positives (a workpiece diagnosed as being faulty although it is fine) differ from the costs of false negatives (a faulty item is missed and “slips”). Even worse, they vary significantly between different customer environments as the costs of slippage versus costs of scrapping a good item depend on how much production effort has been accumulated before that inspection.

However, an ML model can be trained to avoid false negatives at any price – at the cost of allowing more false positives.⁷ Or vice versa. Or anything in between. So, at this intersection of business and technology, a lack of specification inevitably leads to ML models missing the customer-specific optima, hence wasting their money.

(2) Decision Quality Baselines: When ML is used to automate manual tasks, the performance gain over manual execution must be determined and discussed. For example, a person doing the quality inspection with a hit rate of 85% sets the decision quality baseline. An ML model performing at 92% accuracy delivers superior performance compared to the decision quality baseline. When unspecified, data scientists tend to work against an “as good as possible“ target. This has resulted in many cases of Proof-of-Concept phases that are much longer and much more expensive than needed.

(3) Underestimation of Lifecycle Costs: ML is not just about developing the solution once. ML models have to be operated, monitored, and retrained to assure the necessary level of quality while environmental parameters (humidity, lighting conditions on the shop floor, etc.) change – this is knownas data drift. RE plays an instrumental role in specifying data drift tolerance and the severity and costs of the consequences. Also, we see RE as the tool of choice to shape functions and features supporting the lifecycle management. Sophisticated MLOps infrastructure must be in place to manage and monitor ML models. RE can help assessing its completeness and support related cost estimations.

(4) The Underspecification Challenge: Underspecification refers to a gap between the requirements that practitioners often have in mind when they build an ML model and the requirements that are actually enforced by the ML pipeline (i.e., the design and implementation of a model).

A painful consequence of underspecification is that even if the pipeline could in principle return a model that meets all of these requirements, there is no guarantee that in practice the model will satisfy any requirement beyond accurate prediction on held-out data.

Imagine a lab-based proof-of-concept for anomaly detection of pump operation. It shows relevant anomalies in certain frequency bands of the attached vibration sensor signals. However, the rollout of the pump monitoring solution fails as its ML model overlooks relevant anomalies as they appear in very different frequency bands. It turns out that the ground truth of the relevant frequency bands depends strongly on a pump’s foundation material (concrete vs. wood vs. metal base), a fact that remained unspecified during the initial phases of the analytical work [8].

(5) Underestimation of Cloud Resource Costs: Compute resources make up to 30% of a start-up’s bottom line and cloud compute cost overruns are regular [9]. Structural mistakes such as ignoring the need for repetitive model retraining to embrace the (1) cost of false positives vs. false negative pitfall, can be caught by proper RE and help to avoid a sudden cloud cost explosion. The same goes for the phenomenon caused by pitfall (3) Underestimation of Lifecycle costs. Managing data drift by continuous model updates can be costly in the cloud.

(6) The Overfitting Challenge: Data scientists striving to make a difference for one particular customer environment tend to develop ML models that overfit [10]. This means data scientists optimized models for one specific customer at the cost of proper generalization. The ML models’ capabilities to work well in a second and third environment diminishes. RE can help to avoid the overfitting trap by a specification of a valid (2) decision quality baseline (to remedy the “good enough” pitfall) and a specification of how smooth the rollout to further, slightly diverging environments should be.

RE Paving the Way Forward

Do the first seeds of proper RE4AI, specifically the recent momentum in quality attributes and their validation methods, help with AI adoption in digital industries? Let us return to the macro-perspective.

When combined with model-based system design, RE has proven instrumental in setting up a common platform across different disciplines. Examples include mechanical CAD, eCAD, and software development. It fosters interdisciplinary collaboration by definition. Why should we not leverage this essence of RE and expand into yet uncovered areas such as data science and ML? RE, by its nature, must become a tool to bridge the yet somehow distant disciplines of ML and deterministic coding of business logic by developers’ source code.

Secondly, using RE to master complex projects has historically strong ties to the commercial side of life. RE is used to support scope definition, detailed specifications, dependency management between features and quality attributes, and test respectively homologation procedures. These are all tightly linked to effort estimation, Request for Quotation processes (and content), contract design and execution – including such beloved disciplines as claim management.

So far, RE and ML has not been a love-at-first-sight story in the digital industries. However, once the two disciplines get closer, there is a substantial potential to systematically accelerate the vital process of maturing ML development and industrial AI adoption. This maturing process will embrace a commercial perspective – instrumental in fueling a wide-market adoption of any technology.

The RE community has the opportunity to become catalytic for this development. We look forward to a phase of lively discussions about ML specifics, methodology, and tool support. We are curious about the experience of other industrial AI pioneers. Please reach out to any of the authors to join the discussion!

References

McKinsey Global Institute: Notes from the AI Frontier: Modeling the Impact of AI on the World Economy, September 2018
A. Vogelsang and M. Borg. Requirements Engineering for Machine Learning: Perspectives from Data Scientists, In Proc. of the 6th International Workshop on Artificial Intelligence for Requirements Engineering (AIRE), 2019.
G. Kim, The Phoenix Project
C. O’Neil: Weapons of Math Destruction – How Big Data Increases Inequality and Threatens Democracy, 2017.
Element AI: Maturity of AI
https://medium.com/vintage-investment-partners-stories/the-10-commandments-for-corporations-seeking-startup-innovation-part-3-e71ff2955e06
J. P. Winkler, J. Grönberg, & A. Vogelsang. Optimizing for Recall in Automatic Requirements Classification: An Empirical Study. In Proc. of the IEEE 27th International Requirements Engineering Conference (RE), pp. 40-50, 2019.
https://ai.googleblog.com/2021/10/how-underspecification-presents.html
https://www.prnewswire.com/news-releases/new-survey-reveals-one-third-of-businesses-are-exceeding-their-cloud-budgets-by-as-much-as-40-percent-301216394.html
https://www.unite.ai/andrew-ng-criticizes-the-culture-of-overfitting-in-machine-learning/