Nationwide real-world implementation of AI for cancer detection in population-based mammography screening
https://doi.org/10.1038/s41591-024-03408-6
Abstract
Artificial intelligence (AI) in mammography screening has shown promise in retrospective evaluations, but few prospective studies exist. PRAIM is an observational, multicenter, real-world, noninferiority, implementation study comparing the performance of AI-supported double reading to standard double reading (without AI) among women (50–69 years old) undergoing organized mammography screening at 12 sites in Germany. Radiologists in this study voluntarily chose whether to use the AI system. From July 2021 to February 2023, a total of 463,094 women were screened (260,739 with AI support) by 119 radiologists. Radiologists in the AI-supported screening group achieved a breast cancer detection rate of 6.7 per 1,000, which was 17.6% (95% confidence interval: +5.7%, +30.8%) higher than and statistically superior to the rate (5.7 per 1,000) achieved in the control group. The recall rate in the AI group was 37.4 per 1,000, which was lower than and noninferior to that (38.3 per 1,000) in the control group (percentage difference: −2.5% (−6.5%, +1.7%)). The positive predictive value (PPV) of recall was 17.9% in the AI group compared to 14.9% in the control group. The PPV of biopsy was 64.5% in the AI group versus 59.2% in the control group. Compared to standard double reading, AI-supported double reading was associated with a higher breast cancer detection rate without negatively affecting the recall rate, strongly indicating that AI can improve mammography screening metrics.
Fine-Tuning on AI-Driven Video Analysis through Machine Learning: Development of an Automated Evaluation Tool of Facial Palsy
https://doi.org/10.1097/PRS.0000000000011924
Abstract|Background:
Establishment of a quantitative, objective evaluation tool for facial palsy has been a challenging issue for clinicians and researchers, and artificial intelligence–driven video analysis can be considered a reasonable solution. The authors introduced facial keypoint detection, which detects facial landmarks with 68 points, but existing models had been organized almost solely with images of healthy individuals, and low accuracy was presumed in the prediction of asymmetric faces of patients with facial palsy. The accuracy of the existing model was assessed by applying it to videos of 30 patients with facial palsy. Qualitative review clearly showed its insufficiency. The model was prone to detect patients’ faces as symmetric, and was unable to detect eye closure. Thus, the authors enhanced the model through the machine-learning process of annotation (ie, fine-tuning).
Methods:
A total of 1181 images extracted from the videos of 196 patients were enrolled in the training, and these images underwent manual correction of 68 keypoints. The annotated data were integrated into the previous model with a stack of 2 hourglass networks combined with channel aggregation block.
Results:
The postannotation model showed improvement in normalized mean error from 0.026 to 0.018, and qualitative keypoint detection on each facial unit revealed improvements.
Conclusions:
Strict control of inter- and intra-annotator variability successfully fine-tuned the presented model. The new model is a promising solution for objective assessment of facial palsy.