Pipeline Infrastructure Required to Meet the Requirements on AI

This is a personal copy of a column in IEEE Software (Jan/Feb 2023). Republished with permission.

The theme of this issue is Infrastructure as Code (IaC). This concept typically refers to the application of software engineering practices in managing deployment infrastructure. Moving from physical hardware to virtual machines using configuration files that are way more flexible – and they can be version controlled and distributed. As such, IaC paves the way for automation and DevOps. This column touches upon similar topics: pipeline infrastructure to meet regulatory requirements on trustworthy AI solutions.

Markus Borg, Requirements Department Editor

We all know that artificial intelligence has surged lately in its uses, relevance, and impact. Finding new ways to ensure the quality of AI-enabled solutions is essential – especially in light of regulatory requirements. This will require a patchwork of different methods. RE is needed to make them application-specific! Whatever methods we come up with, the backbone supporting them will be pipeline infrastructure.

The last decade showed enormous AI progress around the world. Data-driven solutions pop up in any thinkable and unthinkable application domain. As a consumer, I find that it resembles the “appificitation” triggered by smartphones some 15 years ago. Whatever digital action I wanted to take back then, someone always recited the trademarked(!) Apple slogan “There’s an app for that.” In the other major ecosystem, Google Play went from 0 to 2 million available apps in roughly five years.

Now I recognize a similar phenomenon for data-driven services. Whatever I want to do in the digital world, some AI appears to be waiting for me. A recommendation system for beer, some machine learning-driven image recognition of flowers, a generative model providing style transfers of photos or text… Or some other feature that promises to keep getting better as more data become available. But can we be sure that those AI-driven solutions actually do improve? This might not matter so much for entertainment apps such as suggested above. But what about the increasing use of AI in critical applications?

Regulating AI to Increase Trust in the European Union

Big players in the US and China have spearheaded the recent AI development. In the past five years, many initiatives have been launched elsewhere to help others catch up – or to find AI niches. For example, the EU research agenda aims at positioning Europe at the forefront of “Trustworthy AI.” To support progress toward the goal, the European Commission (EC) commissioned an AI expert group to support the EU AI strategy in 2018.

The expert group first provided the Ethics Guidelines for Trustworthy AI [1]. The guidelines specify expectations on AI providers, that is, important input to the system requirements. Moreover, the expert group complimented the guidelines by the Assessment List for Trustworthy AI [2]. This is a more actionable deliverable that helps AI providers do self-assessments using a checklist. In a nutshell, based on the work by the expert group, the EC states that Trustworthy AI systems shall be:

Lawful: respecting laws and regulations
Ethical: following ethical principles and values
Robust: from a technical perspective and taking into account its social environment

In April 2021, the EC proposed the ambitious Artificial Intelligence Act [3]. The AI Act, illustrated in Figure 1, is a new legal framework aspiring to turn Europe into the global hub for trustworthy AI. The proposed legislation is debated – not the least the very inclusive AI definition – but all signs point to increased regulation of AI. The potential fines for organizations violating the act resemble what applies to the established General Data Protection Regulation (GDPR). Make sure to comply or risk fines corresponding to 6% of the global annual turnover!

Figure 1: AI in the center of attention in the European Union.

The AI Act’s Risk-based Perspective

The AI Act specifies a risk-based approach to distinguish between applications. Some applications are simply prohibited as they introduce unacceptable risks to the EU’s core values, e.g., social scoring and real-time public facial detection. On the other side of the spectrum, developers of video games and pure entertainment apps are considered “minimal risk”. No extra work is needed. However, entertainment apps that feature sophisticated AI such as deepfakes and emotion recognition must provide increased transparency – I believe we will return to that topic in the upcoming theme issue on “Explainable AI for Software Engineering”.

The major change will apply to providers of high-risk AI systems. High-risk means systems with an impact on 1) human health and safety and systems or 2) fundamental rights of EU citizens. The latter includes education, employment, immigration, and justice. In the proposed regulation, AI providers must register their systems in an EU database and rigorously document internal engineering activities. A national supervisory authority must also conduct a conformance assessment before deployment. Obviously, this will be very costly for AI providers.

The costs do not stop with the deployment. AI providers must assure that their deployed systems stay conformant with the AI Act. Not so easy for systems that keep “improving” as more data become available! AI providers must find sustainable ways to assure that they conform – and keep conforming to – the AI Act. How can RE support this work?

Uncertainties Introduced When Tackling Uncertainties

Most improving AI systems use some flavor of Machine Learning (ML). From a Quality Assurance (QA) perspective, ML constitutes a paradigm shift compared to conventional software. This text looks at uncertainties throughout the system life cycle and the dual dependency on data and source code.

First, let’s consider why we need ML-based systems in the first place. ML is used to tackle situations in which we don’t really know what to expect. If we don’t know what the operational environment will bring, we need systems that can generalize. The systems need to find the best way to react to new input based on previous experience. ML is the most feasible way to provide this capability today. Right now, we need ML to tackle uncertainties! Luckily, there is a body of RE work on requirements for systems that operate under environmental uncertainty. Research on self-adaptive systems provides, for example, tailored requirements languages [4] and goal modeling [5].

Second, during development, ML engineers must accept the CACE principle coined by Google Research [6]: “Changing Anything Changes Everything.” Everything is entangled in the world of ML. Add another layer of neurons to the network, change the activation function, increase the learning rate, add more training data, preprocess the data some more… whatever change the ML engineer makes, it is very uncertain how the resulting ML model will perform. Conventional software change impact analysis is not enough here. Instead, ML engineers must experiment and try and try again. Hopefully, with mature tool support for experiment tracking.

Third, during operations, ML-based systems must be continuously monitored. As stressed in the proposed AI Act, it is not enough to just rely on regular retraining of the ML models as more data become available. The environment may change fundamentally during the lifetime of the system. Perhaps users will start providing input from a new type of camera. Or e-scooters suddenly appear in the traffic environment. In e-commerce, maybe a product category turns illegal – suddenly, the AI provider must remove all related training data. Whenever drastic changes happen, ML engineers might need to redesign their models, scrap some data, or do something else available in the data science toolbox. And according to the CACE principle, it is highly uncertain what the impact would be.

MLOps Infrastructure for Sustainable Automation

The only feasible way to tackle the uncertainties is through increased levels of QA automation. In the AI era, this is done in the context of MLOps, “the standardization and streamlining of ML lifecycle management” [7]. The idea is to combine ML and data engineering with DevOps in highly automated development pipelines that support traceability and reproducibility [8]. These last two -ilities are cornerstones in any trustworthiness argumentation. The RE community is good at working with -ilities, and our skills will certainly come in handy now! We need to specify quality targets and help devise methods to monitor them during operations. The RE perspective must be integrated throughout the lifecycle.

MLOps pipelines can be configured to host a new generation of advanced engineering tools. As ML engineering intertwines data and source code, the pipelines must be composed of engineering tools that support QA accordingly. Just like any AI system will be unique, its enabling MLOps pipeline must be tailor-made for the specific system.

Depending on the application, AI providers need to combine different QA techniques. There are numerous solution proposals for AI QA in the research literature. Examples include data completeness checks, formal verification of ML models, neural network adequacy testing, and simulation-based system testing. As usual in QA, organizations cannot put all their eggs in one basket – a convincing assurance case will require a set of complementary methods.

Figure 2 shows a conceptual model of an MLOps pipeline. The pipeline is tailored for an ML-based surveillance application that uses camera input to detect approaching humans. The upper part of the figure shows (1) an ML component – intertwining data and source code – entering the pipeline (2). The cubes in the pipeline illustrate flexibility in pipeline configurations. Various QA methods can be applied in different pipeline phases.

Figure 2: Conceptual MLOps pipeline for a surveillance application.

The MLOps pipeline model consists of three phases each for (3) training and (4) deployment of the ML model, respectively: Data Testing, Model Training, Model Testing; Packaging, Deployment in Test Environment, and Deployment in Production Environment. In the last phase of the pipeline, the ML model is put into production. When in operation (5), the model is continuously monitored. Feedback from development and operations is aggregated in dashboards (6) where requirements for regulatory compliance can be continuously monitored. Together, this shows a high-level picture of a pipeline infrastructure that could help organizations maintain an assurance case over the lifecycle of an AI system.

Trust That AI – Automate its Backbone

Trustworthiness is a quality that must be built in during development. This is analogous to the related system qualities of safety and security. One cannot simply let a production-ready AI system run a gauntlet of test suites to assure its trustworthiness before deployment. As shown in the AI Act, an assurance case for trustworthiness – a structured argumentation backed up by evidence – is more about the development and governance processes than the end product. And building in trustworthiness requires an enabling infrastructure backbone – a pipeline.

We are in the early days of using AI in critical applications. RE is now needed to break down the regulatory stipulations and clauses into product and process requirements. The proposed legislation, such as the AI Act, presents clear expectations on what to argue in terms of trustworthiness. Unfortunately, the gap between expectations to operational methods for the argumentation is substantial. For example, we don’t know what artifacts to provide as evidence in an AI Act conformance case. There are plenty of opportunities for research and practice to find and share best practices – we are in for some exciting times in the RE4AI community.

Whatever we come up with, automation will be needed to make it sustainable. Any novel solution must fit into the infrastructure of an MLOps pipeline. We also know that each AI system will require a tailored pipeline co-evolving with the system. This brings another RE question: How to best specify requirements on the pipeline infrastructure to support trustworthiness throughout the lifecycle of AI systems?

Please reach out if you would like to continue that discussion! We just started a five-year research project related to this topic. Collaborations and reflections would be most welcome.

References

https://digital-strategy.ec.europa.eu/en/library/ethics-guidelines-trustworthy-ai (Accessed: 2022-09-26)
https://digital-strategy.ec.europa.eu/en/library/assessment-list-trustworthy-artificial-intelligence-altai-self-assessment (Accessed: 2022-09-26)
https://eur-lex.europa.eu/legal-content/EN/TXT/?uri=CELEX%3A52021PC0206 (Accessed: 2022-09-26)
Whittle, J., Sawyer, P., Bencomo, N., Cheng, B. H., & Bruel, J. M.. RELAX: a language to address uncertainty in self-adaptive systems requirement. Requirements engineering, 15(2), 177-196, 2010.
Morandini, M., Penserini, L., Perini, A., & Marchetto, A. Engineering requirements for adaptive systems. Requirements Engineering, 22(1), 77-103, 2017.
Sculley, David, et al. “Hidden technical debt in machine learning systems.” Advances in Neural Information Processing Systems, 28, 2015.
Treveil, Omont, Stenac et al. Introducing MLOps. O’Reilly Media, Inc., Sebastopol, CA, USA, 2020.
Mäkinen, Skogström, Laaksonen, and Mikkonen. Who Needs MLOps: What Data Scientists Seek to Accomplish and How Can MLOps Help?. In Proc. of the 1st Workshop on AI Eng., pp. 109-112, 2021.