Med Phys. 2022 May 28. doi: 10.1002/mp.15777. Online ahead of print.
BACKGROUND: Prostate cancer remains the second deadliest cancer for American men despite clinical advancements. Currently, Magnetic Resonance Imaging (MRI) is considered the most sensitive non-invasive imaging modality that enables visualization, detection and localization of prostate cancer, and is increasingly used to guide targeted biopsies for prostate cancer diagnosis. However, its utility remains limited due to high rates of false positives and false negatives as well as low inter-reader agreements.
PURPOSE: Machine learning methods to detect and localize cancer on prostate MRI can help standardize radiologist interpretations. However, existing machine learning methods vary not only in model architecture, but also in the ground truth labeling strategies used for model training. We compare different labeling strategies and the effects they have on the performance of different machine learning models for prostate cancer detection on MRI.
METHODS: Four different deep learning models (SPCNet, U-Net, branched U-Net, and DeepLabv3+) were trained to detect prostate cancer on MRI using 75 patients with radical prostatectomy, and evaluated using 40 patients with radical prostatectomy and 275 patients with targeted biopsy. Each deep learning model was trained with four different label types: pathology-confirmed radiologist labels, pathologist labels on whole-mount histopathology images, and lesion-level and pixel-level digital pathologist labels (previously validated deep learning algorithm on histopathology images to predict pixel-level Gleason patterns) on whole-mount histopathology images. The pathologist and digital pathologist labels (collectively referred to as pathology labels) were mapped onto pre-operative MRI using an automated MRI-histopathology registration platform.
RESULTS: Radiologist labels missed cancers (ROC-AUC: 0.75 – 0.84), had lower lesion volumes (~68% of pathology lesions), and lower Dice overlaps (0.24 – 0.28) when compared with pathology labels. Consequently, machine learning models trained with radiologist labels also showed inferior performance compared to models trained with pathology labels. Digital pathologist labels showed high concordance with pathologist labels of cancer (lesion ROC-AUC: 0.97 – 1, lesion Dice: 0.75 – 0.93). Machine learning models trained with digital pathologist labels had the highest lesion detection rates in the radical prostatectomy cohort (aggressive lesion ROC-AUC: 0.91 – 0.94), and had generalizable and comparable performance to pathologist label trained-models in the targeted biopsy cohort (aggressive lesion ROC-AUC: 0.87 – 0.88), irrespective of the deep learning architecture. Moreover, machine learning models trained with pixel-level digital pathologist labels were able to selectively identify aggressive and indolent cancer components in mixed lesions on MRI, which is not possible with any human-annotated label type.
CONCLUSIONS: Machine learning models for prostate MRI interpretation that are trained with digital pathologist labels showed higher or comparable performance with pathologist label-trained models in both radical prostatectomy and targeted biopsy cohort. Digital pathologist labels can reduce challenges associated with human annotations, including labor, time, inter- and intra-reader variability, and can help bridge the gap between prostate radiology and pathology by enabling the training of reliable machine learning models to detect and localize prostate cancer on MRI. This article is protected by copyright. All rights reserved.