Fewer False-positive Mammograms May Need Human-Computer Mix

Fewer False-positive Mammograms May Need Human-Computer Mix

Artificial intelligence (AI) could be combined with radiologist assessments to improve the accuracy of breast cancer screenings, a study suggests.

The study, “Evaluation of Combined Artificial Intelligence and Radiologist Assessment to Interpret Screening Mammograms,” was published in JAMA Network Open.

Mammograms are commonly used to screen for breast cancer, and their images interpreted by radiologists. This brings is an element of human error  that may contribute to the relatively high false-positive rate in diagnoses: about 1 in 10 people who undergo a mammogram are called back for additional testing. On average, about 1 in 20 of those called back will have breast cancer.

Some have suggested that using AI (that is, complex computer algorithms) to analyze mammography images could improve diagnostic accuracy by removing elements of human fallibility.

The Dialogue on Reverse Engineering Assessment and Methods (DREAM) initiative has run many competitions aimed at generating computer-based methods to improve healthcare for various diseases.

In this study, researchers reported on findings from the digital mammography DREAM challenge, which aimed to generate algorithms to improve the accuracy of mammography-based breast cancer screening.

The challenge was co-organized by IBM, Sage Bionetworks, Kaiser Permanente Washington, and others, with funding from the Arnold Foundation.

Overall, the challenge included 1,100 individuals, comprising 126 teams from 44 countries.

Competitors submitted algorithms to be tested on two datasets, together containing data on more than 300,000 mammogram examinations done on over 150,000 individuals. For patient confidentiality purposes, data were maintained behind a firewall; competitors did not have direct access to them. This also helped avoid the ‘self-assessment trap,’ where algorithm developers judge how well their own algorithm performs, a process in which it’s nearly impossible to be unbiased.

“At the end of the competition of the DREAM Challenge, we found the sobering result that no single algorithm had a lower false positive rate than the radiologists in that same dataset,” two of the study’s co-authors, Gustavo Stolovitzky, PhD, Rami Ben-Ari, PhD, both researchers at IBM, wrote in a blog post.

For instance, the best-performing algorithm in one of the datasets had a 33.7% false-positive rate. That rate for the radiologist assessing the same dataset was 9.5%. Similar results were found for the other dataset.

Following the competition phase, the eight top-performing teams were invited to collaborate, learning from each other and integrating their AI algorithms into a single new algorithm, termed the Challenge Ensemble Model (CEM).

CEM outperformed all the other algorithms — for the previous dataset, its false positive rate was 23.9% — but this was still higher than the radiologists’ false positive rate.

Researchers combined the CEM with the radiologists’ assessments, and found this combo approach reduced the false positive rate to 8.0%.

“An AI algorithm combined with the single-radiologist assessment was associated with a higher overall mammography interpretive accuracy in independent screening programs compared with a single-radiologist interpretation alone,” the researchers wrote. “Our study suggests that a collaboration between radiologists and an ensemble algorithm may reduce the recall rate from 0.095 to 0.08, an absolute 1.5% reduction.”

A 1.5% reduction may not seem like much. But, the researchers pointed out, about 40 million people each year are screened for breast cancer in the U.S. alone.

In this context, 1.5% translates to roughly 600,000 people every year “who would be spared the unnecessary diagnostic work-up, anxiety, and of course cost associated with a recall for further examinations,” Stolovitzky and Ben-Ari wrote.