Why VDE-AR-E 2842-61 (Trustworthy AI Systems) Doesn’t Just Apply to Development

In VDE-AR-E 2842-61 the VDE has developed a whole family of standards for trustworthy autonomous/cognitive systems such as AI systems. Even though these “application rules” are not specific to a particular domain, e.g., medical devices, they are still a treasure trove for many medical device manufacturers.

This article will explain to you:

  • What AI systems are
  • Which manufacturers should take which parts of this family of standards into account
  • The specific suggestions these parts contain
  • Why VDE-AR-E 2842-61 by no means only affects development

1. Trustworthy AI systems

a) Definitions and distinctions

AI systems are a subset of autonomous/cognitive systems that use artificial intelligence techniques.

Autonomous/cognitive systems are defined as follows:

A system is described as autonomous (cognitive) if it can achieve a specified goal independently and in a manner adapted to the situation without human control or detailed programming.

Source: Fachforum Autonome Systeme im Hightech-Forum: Autonome Systeme – Chancen und Risiken für Wirtschaft, Wissenschaft und Gesellschaft. Long version, final report, Berlin, April 2017

The definition in the application rule is:

Definition: autonomous/cognitive system – A/C-system (German: autonom/kognitives System, A/K-System)

“is a technical system that is able to generate autonomous and cognitive behavior. Within the context of this AR an A/C-system is part of a solution.

The term autonomous/cognitive system and especially the German variant “autonom/kognitives System” is a new term, a made-up word coined by this AR. It denotes the special characteristic of complex systems in complex environments (covered by the solution level), trustworthiness aspects and potentially but not necessarily the use of AI in one or more elements of the system. Furthermore it takes into account the common use of “autonomous” in the public along with the expectations on complex behavior of such systems (e.g. in a shop one would rather order an “autonomous car” than a “fully automated car”).”

VDE-AR-E 2842-61-1, section 3.1.8

The standard does not define the property “autonomous” in quite the same way. But, in the context of the standard, “autonomous” means “without human control.” The standard also uses the terms “cognitive” and “cognitive loop” to describe situation-specific behavior.

As there are a very high number of situations that such systems have to be able to react to, “detailed programming” is usually not possible. Therefore, a lot of autonomous/cognitive systems use artificial intelligence techniques.

Conversely, however, not all systems that use AI are also AI systems and thus they are not all autonomous/cognitive systems. For example, software that uses AI to detect cancer on a CT image is not an AI system according to this definition.

Fig. 1: Differentiation between autonomous/cognitive systems, AI systems and devices that use AI.

 Additional information

Read the article on autonomous systems to find out what the specific advantages and risks of such systems are and what regulatory requirements have to be complied with.

This article uses the term “AI system” from here on, as this is the more popular term and “AI systems” fall within the scope of VDE-AR-E 2842-61.

b) Examples of AI systems

Examples of AI systems include:

  • Disinfection robots, like the ones from XENEX
  • Robots that perform tasks in medical laboratories, such as the ones from ABB
  • Robot nurses and other robots in hospitals, for example, autonomous cooperative surgical robots that independently perform surgeries in full or in part and may even be used by non-medical professionals in the future (e.g., in remote areas)
  • Artificial digital pancreases

c) Trustworthiness

Trustworthiness should be understood here as a meta-term that covers safety, cybersecurity, effectiveness, usability, etc.

Trustworthiness […] combines several aspects of trustworthiness in a quite generic way: for every product the set of aspects can be suitably selected and remains unchanged throughout the project. Aspects of trustworthiness include but are not limited to system safety, functional safety, safety of use, security, usability, ethical and legal compliance, reliability, availability, maintainability, and (intended) functionality.

VDE-AR-E 2842-61-1 section 3.1.43

Fig. 2: Aspects of trustworthiness

2. Specific risks of AI systems

a) Risks of autonomous/cognitive systems (in general)

The article on autonomous systems has already detailed some of the risks that are specific to this class of system. These include risks resulting from:

  • The autonomy of the systems
  • Different technical contexts
  • Different clinical contexts
  • Adaptive algorithms
  • Lack of interoperability

There are also additional risks specific to AI systems, which are described in the following sections.

b) Risks caused by an inadequate intended purpose

Manufacturers must define an intended purpose that specifies the

  • Users
  • Use environments
  • Patients
  • Disease, diagnoses and contraindications
  • Use cases
  • Markets

for the AI system.

If the intended purpose is not clear with regard to these aspects, the foundations for all subsequent development phases will be lacking. For example, it must be clear whether the surgical robot can also be used to operate on a knee that already contains implants.

Cobots (collaborative robots), in particular, can be used for different purposes with simple reprogramming or “teaching.” However, this doesn’t mean that the manufacturer is saved from having to define the intended purpose for these individual use cases.

c) Risks caused by unknown situations

Because it is very difficult to predict every situation, manufacturers do not always manage to produce complete specifications.

Even when manufacturers anticipate a situation, it is often difficult to specify the optimal system behavior for each situation.

Without these precise specifications and product requirements, development departments and data scientists will find it difficult to derive specific requirements for AI models and for collecting data for their training.

If a situation was not anticipated during the product specification and development phases, the behavior of the product in this situation is not always predictable.

d) Risks caused by gaps in the specifications

The requirements must be clearly and specifically documented at all abstraction layers in development. This also applies to AI systems and the AI components they contain.

During the development phase in particular, self-learning AI components tempt developers to write unclear specifications as these AI components will “learn what they have to do.” But this is a misunderstanding: AI components can learn “how” to do something, but not the “what.”

AI components cannot be used as a catch-all for unclear or inaccurate specification. This would doom the development to failure. Therefore, the requirements, including the trustworthiness attributes (performance, safety, security, usability, etc.), must be clearly traceable.

e) Risks caused by inadequate training of the models

Any technical errors made during the development of AI models can also lead to risks from AI systems, for example:

  • Errors when collecting and preparing the training data (e.g., incorrect unit conversion)
  • Non-representative and too little training data
  • Choice of suboptimal architectures and hyperparameters
  • Optimization of the model for wrong target values (e.g., accuracy instead of sensitivity)
  • Over-fitting

A comprehensive collection of best practices for minimizing these risks can be found in the Johner Institute’s AI Guidelines, a modified version of which is used by notified bodies.

f) Risk from poor usability

In the case of AI systems especially, poor usability can lead to particularly high risks. This “human factor” aspect can even lead to an “irony of automation.” These ironies include:

  • Unjustified confidence in automation
  • Unjustified mistrust in automation
  • Ignorance of the limits of automation
  • Inability to decide when humans need to intervene
  • Higher instead of lower complexity

g) Risks caused by malicious use

Deep fakes are just one example of how AI can be abused. In the case of medical devices that use AI, it has been shown, at least in the laboratory, how systems for the classification of images can be fooled or caused to divulge sensitive (training) data.

3. VDE-AR-E 2842-61: ensuring the trustworthiness of AI systems

Application rule VDE-AR-E 2842-61 aims to contribute towards controlling these risks and thus ensuring the trustworthiness of AI systems.

In this context, VDE-AR-E 2842-61 claims to cover the entire product life cycle from the product idea through to the phase known to the medical device world as “post-market surveillance”

Fig. 3: Application rule VDE-AR-E 2842-61 consists of several parts that cover the entire life cycle of AI systems. The numbers refer to the parts / volumes of this application rule. (Source: Dr. Henrik Putzer, VDE-AR-E 2842-61 working group) (click to enlarge)

a) “Initiation”: determining the intended purpose

VDE-AR-E 2842-61 offers several approaches to handling the risks posed by AI systems in the event of incomplete or unclear intended purposes.

Problem

Solutions in VDE-AR-E 2842-61

For generic systems such as cobots (see above), no specific intended purpose is defined. These systems can be easily reprogrammed or adapted for new intended purposes.

Concepts for generic proofs of safety (“trustworthiness out of context”) based on the automotive standard ISO 26262

No clear expectations, but a belief that “the AI will somehow solve everything intelligently”

Definition of use cases and the intended benefit These are backed up with a clear ontology, which is also used in the requirements description (incl. traceability) and is refined and made usable in later phases up to data set coverage metrics.

The AI system is not considered over its entire life cycle. For example, aspects of the intended use, such as the update and maintenance of the system, are missing.

Model the product lifecycle using a customer journey map or UX/experience map

b) “Solution level” AI system specification

VDE-AR-E 2842-61 also offers solutions for the risks described above resulting from specification gaps and unknown situations.

Problem

Solutions in VDE-AR-E 2842-61

Imprecise and ambiguous specifications

Modeling and notations (see above):

  • Ontologies
  • BPMN/SysML for the description of the black box
  • Acceptance criteria covering performance and all “trustworthiness aspects.” These acceptance criteria also indicate the coverage levels to be reached that must be achieved in the tests
 

The division of tasks and responsibilities between users and the AI system is not precisely defined (e.g., between the surgeon and surgical robot)

Defined notation (e.g., BPMN/SysML), to model a “solution concept” for the system black box.

Specific risks caused by different situations, e.g., availability of system components

 
  • Dynamic risk management, as described in the article on autonomous systems.
  • Multidimensional risk analysis (trustworthiness = safety + security + usability + ethics + ..., in this case: safe + effective + secure) and definition of conflict-free development objectives (= trustworthiness goals) to cover all risks
 

AI is used as a placeholder for unclear functionality or technical implementation

Formulate a functional model of the AI system based on sense-plan-act or another cognitive theory. This also provides the “white box” model of the [text missing] in the next phase. 

c) “System level”: when designing the system architectures of AI systems

VDE-AR-E 2842-61 also offers solutions for typical risks during the development of AI systems.

Problem

Solutions in VDE-AR-E 2842-61

The output of data-driven models is subject to uncertainty.

 
  • Use of an “uncertainty wrapper” to estimate these uncertainties and account for them in autonomous/cognitive behavior.
  • Use of safety margins
 

The design does not sufficiently take into account the requirements of the specification.

 
  • Document traceability
  • Structured and hierarchical development
 

Selected architecture is not the “best”

Use design patterns to demonstrate trustworthiness and achieve certain AI-relevant product properties (continuous learning, explainability, etc.)

d) In machine learning (from specifying data to training models)

Problem

Solutions in VDE-AR-E 2842-61

Proofs of safety are harder to produce

  

Suboptimal model chosen

 
  • Use AI blueprints
  • Restrict the use of AI
 

e) When verifying and validating the AI systems

VDE-AR-E 2842-61 describes solutions for all levels of the verification and validation.

Problem

Solutions in VDE-AR-E 2842-61

Incorrect conclusions from test results, e.g., because a claim is made in the trustworthiness assurance case based on test results, but this claim is not valid. For example, because specifics of the actual use context (target application scope) were not taken into account

 
  • Range of methods and support in selection, e.g., statistical testing, clear metrics on the right side of the V-model, NN stress testing with adversarial testing, NN analysis with heat maps etc.
  • Constructive support to help build safety argumentation based on the results / applications of the methods
  • Information on common errors in argumentation
 

Incomplete proofs

Clear definition of these objectives based on the trustworthiness analysis; traceability; and proof in the “trustworthiness assurance case” with suitable, structured argumentation (e.g., with GSN) that uses appropriate tangibles (test reports, analyses, etc.) as evidence.

f) During post-market surveillance

Problem

Solutions in VDE-AR-E 2842-61

The assurance cases contain a lot of assumptions about the use context. These could prove to be inaccurate in practice.

 
  • The system is only “authorized” for use according to the intended purpose (incl. use environment)
  • Monitor the actual use of the system during post-market-surveillance and continually check whether the assumptions are met
  • Gradual introduction of the systems, if necessary, with monitoring (“bootstrap approach”)
 

4. The VDE-AR-E 2842-61 family of standards

a) Scope

The VDE-AR-E 2842-61 is applicable to all industries and all applications that fall into the class of autonomous/cognitive systems, especially AI systems. It also refers to these systems as “systems of systems.” Therefore, in the context of medical device law, “system” would mean “device,” not a system in the sense of Article 22 (“Systems and procedure packs”).

However, VDE-AR-E 2842-61 does not make any specific reference to medical devices.

Nevertheless, the standard is also recommended for medical devices to help build safety arguments to use with authorities and notified bodies. It adds new aspects to existing regulations, such as uncertainty.

VDE-AR-E 2842-61 is recommended

  • For the development of new devices / systems
  • For the further development of these devices / systems
  • As a checklist for evaluating new and existing systems

b) Overview of the family of standards

The application rule VDE-AR-E 2842-61 consists of a whole family of standards.

Part

Title

Status

VDE-AR-E 2842-61-1

Terms and concepts

Available

VDE-AR-E 2842-61-2

Management

Available

VDE-AR-E 2842-61-3

Development at Solution Level

Completed, in approval

VDE-AR-E 2842-61-4

Development at System Level

Expected for 2021-Q2

VDE-AR-E 2842-61-5

Development at Technology Level

Expected for 2021-Q2

VDE-AR-E 2842-61-6

After Release of the Solution

Available

VDE-AR-E 2842-61-7

Application Guide

Postponed

Figure 4: Overview of the VDE-AR-E 2842-61 family of standards (click to enlarge) Source: Figure 2 of VDE-AR-E 2842-61-1

c) Structure and examples

Standard concept

Example

The standards divide the requirements along the life cycle of the device into sections and subsections.

The 3rd part of the standard contains sections 7 “Solution Concept” and 8 “Trustworthiness Concept.”

Every section requires the person responsible to set objectives.

The objectives of the “trustworthiness concept” include:

  • Determining the relevant aspects of trustworthiness
  • Identifying hazards and estimating risks
  • Setting trustworthiness goals (i.e., the objective is to set other objectives)
  • Determining “trustworthiness measures” and assigning the “solution concept”
 

In order to achieve the respective objective, certain tasks must be completed.

The standard pairs the objective “determining the relevant aspects of trustworthiness” with the corresponding task.

Each task consists of a set of activities.

The 10 activities include identifying relevant standards (e.g., IEC 61508) and defining the trustworthiness aspects (e.g., those of ISO 25020).

For some of these activities, the standard specifies the inputs that have to be taken into account.

To define the trustworthiness aspects, the person(s) responsible must take the user requirements and the regulatory requirements into account.

For these activities, means, such as tools, templates or other resources, can be defined.

These means include, for example, the aforementioned standards and literature references.

Fig. 5: the concept of VDE-AR-E 2842-61 as a UML class diagram

5. Is VDE- AR-E 2842-61 binding?

VDE-AR-E 2842-61 is not a harmonized standard. No harmonization is planned either. The likelihood of an auditor or reviewer at a notified body requiring this family of standards as the state of the art is (still) low.

The concepts of the family of standards complement those of ISO 14971IEC 60601 and IEC 62304 well. This applies in particular to

  • Risk management
  • The risk-based approach
  • The documentation or development and life cycle model (which seems to be based on the V-model)

6. Summary

a) The good

The VDE-AR-E 2842-61 family of standards takes a very systematic approach. It uses a data model and itself uses a clear and largely comprehensive terminology.

It is also good that it covers the entire lifecycle of autonomous/cognitive systems, such as AI systems, and aligns its structure with these lifecycle phases. This makes assignment easier.

The authors are obviously experts in AI systems who think in logical structures and concepts.

b) What gives pause for thought

VDE-AR-E 2842-61 is not a sector-specific standard. Therefore, the reader is sometimes not quite sure how to implement the concepts presented in accordance with ISO 14971. This is also due to the fact that the family of standards does not (yet) define key terms such as “risk” and “hazard,” and does not (yet) specify which solution should be used for which risk.

The seven parts (which have not all been published yet) are, together, several hundred pages long. And, in some places, it all seems a bit academic. Some sentences leave you wishing for more precision:

The person responsible for the solution level shall use the knowledge gained on hazards and degraded modes to define trustworthiness measures to cover all trustworthiness goals and further constraints given by their attributes (e.g. timely detection and control of relevant functional – consider safe state).

Source: VDE-AR-E 2842-61-3, section 8

But the annexes do contain examples. Nevertheless, it would have been better if the seventh part (the “Application Guide”, of all things) had not been put on hold.

Anyone who, for example, like McKinsey consultants, uses the MECE principle and the “pyramid concept” will wonder whether the hierarchy of concepts is sufficiently clear-cut. The following example is from the eighth section of the third part (section 3-8):

Element

Example from VDE-AR-E 2842-61

Comment

Objectives

The objectives of this section are:

(1) to define the applicable trustworthiness aspects and to integrate relevant analysis methods from other standards;

That is more of a task than an objective.

The actual objective of this section is to develop a “trustworthy solution concept.”

Tasks

to define the applicable trustworthiness aspects and to integrate relevant analysis methods from other standards;

This is exactly the same wording as one of the objectives.

Activities

The person responsible for the solution level shall define the applicable trustworthiness aspects.

This wording is almost exactly the same as the previous wording.

c) Conclusion

Anyone who works in AI system development should not only be familiar with VDE-AR-E 2842-61, they should use it. It provides a good overview of the state of the art and helps to ensure that no relevant life cycle activities are forgotten.

Therefore, the family of standards concerns not only development but all lifecycle phases: from defining the intended purpose through to the post-market surveillance.

The family of standards combines well with the concepts in ISO 14971, IEC 60601-1 and IEC 61508. It also provides valuable guidance to manufacturers of devices that fall within the scope of IEC 60601-1, even if these medical devices are not autonomous/cognitive systems.

Users of VDE-AR-E 2842-61 must be able to think in abstract concepts and apply the best practices it contains to specific use cases. This requires a high level of skill. Skills that should be expected from individuals who develop AI systems for medicine.


VDE-AR-E-2842-61 is available from VDE.

Dr. Rasmus Adler of the Fraunhofer IESE and Dr. Henrik Putzer of fortiss, the research institute of the Free State of Bavaria for software-intensive systems, and Cogitron contributed to this article. Both will be happy to answer any questions.

Author:

Prof. Dr. Christian Johner

Starter-Kit_rot_dunkel

A quick overview: Our

Starter-Kit

Learn More Pfeil_weiß
blog_rot_dunkel

Always up to date: Our

Institutejournal

Learn More Pfeil_grau