Safe Machine Learning: Why Performance is Not Enough

This is a personal copy of a column published in IEEE Software (Mar/Apr 2026). Republished with permission.

Outside safety engineering circles, “AI safety” is often a catch-all term. It is used for issues ranging from existential risks and alignment problems to the societal implications of deepfakes and model-level hallucination. In regulated domains, however, safety has a very specific meaning. It’s about preventing real-world harm in a particular system and operating context. And when machine learning (ML) enters safety-critical systems, the hard part is no longer training yet another high-performing model. Instead, it’s specifying, refining, tracing, verifying, and validating the right safety requirements – for the system, for the ML component, and even for the data. In this column, I join Hawkins and colleagues, leading experts in safety engineering, to discuss what is really needed for safe ML. Whether or not you work on safety-critical applications, there is something to learn here – and maybe even more if you are an ML developer. – Markus Borg

Richard Hawkins, Colin Paterson, Ibrahim Habli, and Markus Borg.

The great exhibition hall is buzzing. It’s the highlight of the year for the capital’s chess community: the Grand Open and its surrounding convention. Boards are packed, clocks are ticking, and everyone is having a good time. Until a boy’s heartbreaking scream cuts through the hum. People rush toward one of the busiest booths. A seven-year-old’s finger has been mistaken for a piece, and the chess robot has just made a highly illegal move… the fracture happens in an instant inside the powerful gripper.

This is close to what happened during Moscow Open in 2022. “This is of course bad” commented the president of the Moscow chess federation [1]. We agree. It stands as a stark reminder of the importance of safety requirements in the design of autonomous systems.

Machine learning (ML) has been shown to outperform humans across a range of tasks crucial to autonomous systems. Perception and object classification in images are typical examples. However, like any new technology, alongside new capabilities come new risks, often linked to new and uncertain modes of failure.

What is Safety of ML?

Safety is rarely described in absolute terms. It is often qualified through the notion of risk, which is defined by the likelihood and severity of an undesirable outcome. This outcome commonly takes the form of harm to humans or damage to property and the environment. In this regard, we can define safety as freedom from unacceptable risk of harm [5].

When discussing ML safety, we are concerned with the specific types of harm caused by its use, particularly when ML is the primary technology implementing critical functions [6]. Examples include perception (e.g., detecting pedestrians in autonomous vehicles) and decision-making (e.g., advising doctors on possible treatments).

Establishing confidence in safety, or assurance, especially for a complex technology like ML, requires transparency and clear communication. This involves engaging the appropriate stakeholders, i.e., those most familiar with or exposed to the risk, to make an informed judgment about risk acceptability. Beyond engineers and regulators, these stakeholders include professional users such as pilots and doctors, and those directly exposed to risk such as passengers and patients.

Safety cases are frequently used to provide this kind of assurance. They do so by presenting an argument, supported by evidence, that a system or functionality is acceptably safe for a particular purpose in a specific context. Effective safety assurance hinges on analyzing the system to derive safety requirements sufficient to mitigate the risk of identified safety hazards. In our work, we focus on hazards resulting from ML function failure conditions (e.g. misclassifying an urgent cancer treatment need as a routine check).

It’s important to note here that safety can only ever be defined, and safety hazards understood, in terms of a particular system and a particular operating context. Safety, whether for ML or any other type of system, must therefore be grounded in established and proven safety engineering approaches. In many ways, the context-sensitive nature of safety in general begins to explain the principal safety assurance challenge for ML.

Requirements for Safe Behavior

Safety requirements define how the system must behave to be acceptably safe. Where these safety requirements relate to ML components, we must ensure that we create models that can be demonstrated to meet them. Safety is rarely concerned just with overall performance. The ML component must be free from unacceptable risk in all situations it may encounter, and we must be able to demonstrate this. For systems that must operate in complex, open, and dynamic environments, this is particularly challenging, as the operational space is vast and uncertain – just think of a self-driving car in an urban setting.

In ML terms, it’s the robustness of the model that is generally most critical to its safety. Whilst ML engineers are familiar with the development of robust models, it is vital that we are very clear about what we mean by robustness when we consider safety.

Once again, the key distinction is that we cannot consider the ML component in isolation. There will be many failures of ML components in particular circumstances that we don’t care about from a safety perspective, since they don’t pose a safety hazard for that system application. This is not to say that these don’t matter; just that the lack of robustness of the model in such circumstances does not affect the safety of the system.

Figure 1 illustrates how failures in an ML component can pose a safety hazard to a particular system. For this to occur, there must be a propagation of the ML failure contributing to failure in the broader software, then to failure in the broader system or platform, and then, through the resulting unintended behavior of the system, ultimately to harm to people who interact with it.

Figure 1: How failures in an ML component can lead to a safety hazard, showing the many different barriers in a system that can prevent propagation of the causal chain. For each barrier, a single example is provided of a potential control mechanism.

A well-designed system will include numerous barriers that prevent failures from propagating throughout the system. These barriers are illustrated as green lines in Figure 1, and could include technical and architectural mitigations such as design redundancy and monitoring, as well as socio-technical mitigations involving human oversight or control. It is only by analyzing and understanding the potentially complex failure modes and causal chains within the system that the safety contribution of the ML component can be defined.

Safety Cases for ML

We don’t just need to develop ML that satisfies the safety requirements – we must also demonstrate, with sufficient confidence, that we expect it to always hold. Ultimately, we need to create a safety case for the ML.

Creating compelling safety cases is difficult. Creating compelling safety cases for ML components is even more challenging, particularly because there are no established guidelines for doing so. This motivated us to develop AMLAS [2].

AMLAS provides practical guidance on tackling ML safety [3]. As shown in Figure 2, AMLAS is split into six stages. Each of these stages consists of a set of activities to generate assurance artifacts for use as evidence in an ML safety case. AMLAS stages run in parallel with the ML development, so assurance is integrated from the start and can trigger iterative improvement.

Figure 2: Overview of the six-stage AMLAS process.

As discussed earlier, safety requires more than just good performance. Thus, using AMLAS is more than simply following good ML development practices – we expect that good practice is followed anyway when developing these kinds of systems! When we consider requirements there are four key aspects of AMLAS that highlight this difference.

Scoping and ML Safety Requirements

Firstly, the assurance scoping undertaken at stage 1 requires the definition and justification of the context in which the safety of the ML is demonstrated. This requires an understanding of the system-specific issues that determine how the ML contributes to safety hazards. As previously discussed, this will include the overall system architecture, the system’s operating context, and the system safety requirements determined through the system safety analysis process.

It is a crucial prerequisite of AMLAS that these scoping activities are done. For ML developers, this will likely require interactions with system and safety engineers that would otherwise not occur. Such interactions can be challenging due to differences in domain knowledge, terminology, and technical expertise [4].

This feeds into the second key difference we would highlight in AMLAS, relating to stage 2. This stage considers the specification and justification of a set of ML safety requirements. ML components are inherently underspecified, so when safety is involved, we must be particularly clear about their requirements.

The natural language safety requirements defined during the scoping stage of AMLAS, are not in a form directly useful for model learning or verification. The safety requirements must therefore be translated into requirements meaningful to ML engineers that can still be demonstrated to maintain the intent of the original safety requirements.

This is a challenge due to the large gap between the real-world concepts considered in safety requirements and the detailed metrics required for ML development. The safety case plays a key role here in explaining the sufficiency of this translation. The complex nature of safety requirements for most systems means that single metrics are unlikely to be sufficient when deriving ML safety requirements. Once more, for ML developers, the specification and justification of requirements in this way are unlikely to be standard tasks and will likely require further interaction with system and safety engineers.

Data Requirements as First-Class Citizens

Thirdly, the data management in AMLAS is a highly focused stage that seeks to assure the appropriateness of the data sets for creating safe ML models. On the contrary, a common approach in ML is to build the best possible model from the available data. Such an approach does not allow us to justify the sufficiency of the data used and thus to provide the required assurance in the learned model. For this reason, AMLAS requires explicit data requirements that define the properties the data sets must have.

Stage 3 of AMLAS requires that the traceability from the data requirements to the ML safety requirements is established and justified. ML data requirements must consider a range of properties such as relevance, completeness, balance, and accuracy, all defined within the context of the safety requirements. Specifying and validating data sets in this way is often a new challenge for ML developers.

Process Requirements for Verification Evidence

The final aspect of AMLAS that we highlight here is the requirement for independent verification of the ML component. Although testing is an inherent part of any ML development process, it often focuses solely on developers’ internal testing of model performance. From a safety assurance perspective, verification of ML components has a distinct and crucial role.

Firstly, it is uniquely focused on generating evidence that the ML complies with the ML safety requirements. More importantly, however, it must demonstrate the generalizability of that safe behavior across all situations that the system may encounter in operation. From a safety perspective, this means there must be a particular focus on those situations that are most likely to lead to hazards, which are often particularly low probability “edge cases.“

Stage 5 of AMLAS asks for a different mindset. In contrast to the development data that should lead to a good model, the verification data set should be designed to make the model fail. If it fails, we reveal a limitation in the model’s generalizability to the operating domain, which must be corrected for the model to be safe. If the ML does not fail, this provides confidence in the model’s generalizability, which supports the safety case. The strength of the argumentation relies on both the sufficiency and the independence of the verification data. Independence does not require a separate organization, but the verification data sets should remain invisible to the development team.

As one follows through the six steps of AMLAS, and carry out the defined activities, one can build up the evidence and argument for the ML safety case. This safety case can be used to demonstrate that the ML component is sufficiently safe, but only within the specific system and environmental context within which AMLAS was applied. The ML safety case becomes part of the overall safety case for the wider system, and system-level integration and verification become critical considerations.

AMLAS has now been successfully adopted as a means for creating safety cases for ML systems in a range of domains, including healthcare, automotive, and defense. The level of interest and uptake of AMLAS demonstrates an appetite for demonstrating the safety of ML systems.

It is clear, however, that AMLAS is only a starting point, and mustn’t deflect from the established principles of developing and assuring safe systems. The continued development of ML and AI technologies, including the use of large language models, of course, raises new questions and challenges that must be addressed. The increasing misappropriation of the term “AI safety” as almost exclusively focusing on existential risk is unhelpful in this regard.

As AMLAS continues to evolve, so does the broader conversation about safe ML. What would you need, as an engineer or chess parent, to trust a robot to play against humans? We’d love to hear your thoughts!

References

[1] Guardian Online. Chess robot grabs and breaks finger of seven-year-old opponent. https://www.theguardian.com/sport/2022/jul/24/chess-robot-grabs-and-breaks-finger-of-seven-year-old-opponent-moscow
[2] Paterson, C., Hawkins, R., Picardi, C., Jia, Y., Calinescu, R. and Habli, I., 2025. Safety assurance of Machine Learning for autonomous systems. Reliability Engineering & System Safety, p.111311.
[3] Richard Hawkins, Colin Paterson, Chiara Picardi, Yan Jia, Radu Calinescu, and Ibrahim Habli (2021). Guidance on the Assurance of Machine Learning in Autonomous Systems (AMLAS). arXiv preprint arXiv:2102.01564.
[4] Nadia Nahar, Shurui Zhou, Grace Lewis, and Christian Kästner (2022). Collaboration Challenges in Building ML-enabled Systems: Communication, Documentation, Engineering, and Process, Proc. of the 44th International Conference on Software Engineering, pp. 413-425.
[5] Lowrance, William W. “Of acceptable risk: Science and the determination of safety.” (1976).
[6] Habli, Ibrahim. “On the Meaning of AI Safety.” In 2025 20th European Dependable Computing Conference Companion Proceedings (EDCC-C), pp. 185-188. IEEE, 2025.