Artificial Intelligence in Medicine
More and more medical devices are using artificial intelligence to diagnose patients more precisely and to treat them more effectively. Although a lot of devices have already been approved (e.g. by the FDA), a lot of regulatory questions remain unanswered.
This article describes what manufacturers whose devices are based on artificial intelligence techniques should pay attention to.
1. Artificial intelligence: what is it?
The terms artificial intelligence (AI), machine learning and deep learning are often used imprecisely or even synonymously.
The term “artificial intelligence” (AI) itself leads to discussions about, for example, whether machines are actually intelligent.
We will use the definition below:
“A machine’s ability to make decisions and perform tasks that simulate human intelligence and behavior. Alternatively
- A branch of computer science dealing with the simulation of intelligent behavior in computers.
- The capability of a machine to imitate intelligent human behavior”
So it is about machines ability to take on tasks or make decisions in a way that simulates human intelligence and behavior.
A lot of artificial intelligence techniques use machine learning, which is defined as follows:
“A facet of AI that focuses on algorithms, allowing machines to learn and change without being programmed when exposed to new data.”
And deep learning is, in turn, part of machine learning and is based on neural networks (see Fig. 1).
“The ability for machines to autonomously mimic human thought patterns through artificial neural networks composed of cascading layers of information.”
This gives us the following taxonomy:
Fig. 1: Artificial intelligence is based on numerous techniques, of which machine learning is only one part. Neural networks, deep learning, are part of machine learning.
The assumption that artificial intelligence in medicine mainly uses neural networks is not correct. A study by Jiang et al. showed that support vector machines are used most frequently (see Fig. 2). Some medical devices use several methods at the same time.
Fig. 2: Most artificial intelligence techniques used in medical devices fall into the “machine learning” category. Neural networks are the second most popular among manufacturers. (Source) (click to enlarge)
2. Applications of artificial Intelligence in medicine
Manufacturers use artificial intelligence, especially machine learning, for tasks such as the following:
Detecting a retinopathy
Images of the eye fundus
Counting and recognizing certain cell types
Images of histological sections
Diagnosis of heart infarctions, Alzheimer's, cancer, etc.
Radiology images, e.g. CT, MRI
Speech, movement patterns
Selection and dosage of medicines
Diagnoses, gene data, etc.
Diagnosis of heart diseases, degenerative brain diseases, etc.
ECG or EEG signals
Laboratory values, environmental factors etc.
Time-of-death prognosis for intensive care patients
Vital signs, laboratory values and other data in the patient's records
Table 1: Comparison of the tasks that can be performed with artificial intelligence and the data used for these tasks
Other applications include:
- Detection, analysis and improvement of signals e.g. weak and noisy signals
- Extraction of structured data from unstructured text
- Segmentation of tissues e.g. for irradiation planning
Fig. 3: Segmentation of organs (here a kidney) with the help of artificial intelligence (Source) (click to enlarge)
b) Tasks: classification and regression
The techniques are used for the purpose of classification or regression.
Examples of classification
- Decision as to whether there is a diagnosis
- Deciding whether cells are cancer cells or not
- Selecting a medicine
Examples of regression
- Determining the dose of a medicine
- Time-of-death prognosis
3. AI from a regulatory perspective
a) Regulatory requirements
There are currently no laws or harmonized standards that specifically regulate the use of artificial intelligence in medical devices. However, these devices must meet existing regulatory requirements, such as:
- The manufacturers must demonstrate the benefit and performance of the medical device. For devices that are used for diagnostics purposes, the sensitivity and specificity, for example, must be demonstrated.
- The devices must be validated against the intended purpose and stakeholder requirements and verified against the specifications (including MDR Annex I 17.2).
- They must ensure that the software has been developed in a way that ensures repeatability, reliability and performance (including MDR Annex I 17.1).
- Manufacturers must describe the methods they will use for these verifications.
- If the clinical evaluation is based on a comparator device, this device must be sufficiently technical equivalent, which explicitly includes the evaluation of the software algorithms (MDR Annex XIV, Part A, paragraph 3).
- Before development, manufacturers must determine and ensure the competence of the people involved (ISO 13485:2016 7.3.2 f).
b) Regulatory requirements
Manufacturers regularly find it difficult to prove that the requirements placed on the device, e.g. with regard to accuracy, correctness and robustness, have been met.
Dr. Rich Carruana, one of Microsoft's leading minds in artificial intelligence, advised against the use of a neural network he had developed himself to propose an appropriate therapy for pneumonia patients:
“I said no. I said we don’t understand what it does inside. I said I was afraid.”Dr. Rich Carruana, Microsoft
The questions that auditors should ask manufacturers include, for example:
How did you reach the assumption that your training data has no bias?
Otherwise the results would be wrong or only correct under certain conditions.
How did you avoid overfitting your model?
Otherwise, the algorithm would only correctly predict the data it was trained with.
What makes you assume that the results are just randomly correct?
For example, it could be that an algorithm correctly decides that an image contains a house. But that the algorithm did not recognize a house, but the sky. Another example is shown in Fig. 3.
What requirements does the data have to meet in order to correctly classify your system or predict the results? Which framework conditions must be observed?
Since the model was trained with a certain quantity of data, it can only make correct predictions for data coming from the same population.
Would you not have achieved a better result with another model or with other hyperparameters?
Manufacturers must minimize risks as far as possible. These also include risks resulting from incorrect predictions made by sub-optimal models.
Why do you assume that you have used enough training data?
Collecting, processing and “labeling” training data is time-consuming. The more data that is used to train a model, the more powerful it can be.
What gold standard did you use when labeling the training data? Why do you consider the chosen standard to be the gold standard?
Particularly if the machine starts to be superior to people, it becomes difficult to determine whether a physician, a group of “normal” physicians, or the world's best experts in a discipline are the reference.
How can you ensure reproducibility if your system continues to learn?
Continuous Learning Systems (CLS), in particular, must ensure that the further training does not, at the very least, reduce performance.
Have you validated systems that you are using to collect, prepare, and analyze data, and to train and validate your models?
An essential part of the work consists of collecting and processing the training data and using it to train the model. The software needed for this is not part of the medical device. However, it is subject to the requirements of the Computerized Systems Validation.
Table 2: Aspects that should be addressed in the review of medical devices with associated declaration
The questions are typically also discussed as part of the ISO 14971 risk management process and the clinical evaluation according to MEDDEV 2.7.1 Revision 4.
Fig. 4: Input data that only randomly looks like a certain pattern. In this example, a Chihuahua and a muffin (source) (click to enlarge)
c) Approaches to solutions
Auditors should no longer be generally satisfied with the statement that machine learning techniques are black boxes. The current research literature shows how manufacturers can explain and make transparent the functionality and "inner workings" of devices for users, authorities and notified bodies alike.
For example, using Layer Wise Relevance Propagation it is possible to recognize which input data (“feature”) was decisive for the algorithm, e.g. for classification.
Figure five shows, in the left picture, that the algorithm can rule out a number "6" primarily because of the pixels marked dark blue. This makes sense, because with a "6" this area typically does not contain any pixels. On the other hand, the right image shows in red the pixels that reinforce the algorithm's assumption that the digit is a “1”.
The algorithm evaluates the pixels in the rising part of the digit as damaging for classification as "1". This is because it was trained with images where the “1” is written as a simple vertical line, as is the case in the USA. This shows how important it is for the result that the training data is representative of the data that is to be classified later.
Fig. 5: Layer Wise Relevance Propagation determines which input is responsible for which share of the result. The data are visualized here as a heat map (source). (click to enlarge)
The free online book “Interpretable Machine Learning” by Christoph Molnar, who is one of the keynote speakers at Institute Day 2019, is particularly worth a read.
The Johner Institute supports manufacturers of medical devices that use artificial intelligence, for example in the following areas
- Developing and marketing the device in compliance with the law
- Planning and performing corresponding verification and validation activities
- Evaluating the device's benefits, performance and safety
- Evaluating the suitability of the techniques (in particular the models) and the training data
- Complying with the regulatory requirements for the post-market phase and
- Creating tailor-made procedural instructions.
5. Conclusion, outlook
a) From hype to actual practice via disillusionment
Artificial intelligence is currently receiving a lot of hype. A lot of “articles” praise it as either the solution to every medical problem or the start of a dystrophy in which machines will take over. We are facing a period of disillusionment. “Dr. Watson versagt” [“Dr. Watson fails”] was the title on article in issue 32/2018 of Der Spiegel on the use of AI in medicine.
It has to be expected that the media will write over-the-top and scandalized reports on cases where bad AI decisions have tragic consequences. But over time, the use of AI will become just as normal and indispensable as the use of electricity. We can no longer afford and no longer want to pay for medical staff to perform tasks that computers can do better and faster.
b) Regulatory uncertainty
The regulatory framework and best practices lag behind the use of AIs. This leads to risks for patients (medical devices are less safe) and for manufacturers (audits and approval procedures seem to reach arbitrary conclusions).
In 2019, the Johner Institute, together with notified bodies, will publish a guideline for the safe development and use of artificial intelligence - comparable to the IT Security Guideline.