Fluid-attenuated inversion recovery (FLAIR) brain MRI are widely used to benchmark new anomaly detection models. In this work, we show that simple thresholding outperforms all current models (including an ensemble of 8 vision transformers) on four popular datasets. Our findings question the usefulness of FLAIR MRI for benchmarking anomaly detection models.
Fig. 1. Qualitative results of our baseline. Two samples are shown for each data set. Top row: input image. Middle row: predicted anomaly map, binarized using the thresh- old that yields the best DSC for each data set. Bottom row: ground truth anomaly segmentation.
Tab. 1. Comparison of thresholding to current models
MSLUB | MSSEG2015 | |||||
---|---|---|---|---|---|---|
Method | ⌈DSC⌉ | AUPRC | AUROC | ⌈DSC⌉ | AUPRC | AUROC |
AE (dense) | 0.271 | 0.163 | 0.794 | 0.185 | 0.080 | 0.879 |
AE (spatial) | 0.154 | 0.065 | 0.732 | 0.106 | 0.037 | 0.781 |
VAE (restauration) | 0.333 | 0.275 | 0.839 | 0.272 | 0.202 | 0.905 |
GMVAE (restauration) | 0.332 | 0.271 | 0.836 | 0.280 | 0.199 | 0.909 |
f-AnoGAN | 0.283 | 0.221 | 0.856 | 0.342 | 0.255 | 0.923 |
SSAE (spatial) | 0.301 | 0.222 | - | - | - | - |
Ours | 0.374 | 0.271 | 0.991 | 0.431 | 0.262 | 0.996 |
Tab. 2. Comparison of thresholding to an ensemble of 8 vision transformers
⌈DSC⌉ | |||
---|---|---|---|
Method | BraTS | MSLUB | WMH |
Transformer | 0.759 | 0.465 | 0.441 |
Ours | 0.738 | 0.613 | 0.557 |