Many current state-of-the-art methods for anomaly detection in medical images rely on calculating a residual image between a potentially anomalous input image and its (“healthy”) reconstruction. As the reconstruction of the unseen anomalous region should be erroneous, this yields large residuals as a score to detect anomalies in medical images. However, this assumption does not take into account residuals resulting from imperfect reconstructions of the machine learning models used. Such errors can easily overshadow residuals of interest and therefore strongly question the use of residual images as scoring function. Our work explores this fundamental problem of residual images in detail. We theoretically define the problem and thoroughly evaluate the influence of intensity and texture of anomalies against the effect of imperfect reconstructions in a series of experiments.
In this work, we experimentally showed that image-reconstruction based anomaly detection models can not detect anomalies that are not characterized by hypo- or hyper-intensity.
Fig. 1. If the reconstruction of an image is only slightly imperfect (simulated with very light gaussian blurring), detection performance for anomalies with intensities between 0.2 and 0.8 decreases dramatically.
Fig. 2. Our experiments from Fig. 1 are confirmed by experiments with real anomaly detection models that also fail to detect anomalies that are not hyperintense.
Fig. 3. This figure shows the capacity-tradeoff image-reconstruction models face. If the capacity is too low, reconstruction will be imperfect (AE, first two columns). If the capacity is too high, the model will simply reconstruct the anomalies as well (VQ-VAE, third column). Currently, there is no sweet-spot for this problem.