Are Deep Learning Algorithms Ready for the Clinical Setting?
Is using a deep learning algorithm to detect glaucoma ready for prime time? And by prime time we mean the clinical setting. Not according to Dolly S. Chang, MD, a clinical fellow and clinical instructor at Stanford University School of Medicine in Palo alto, CA. Performance in a controlled environment is one thing, she said. “But how it performs in a community setting where there is a full spectrum of cases is quite another.” Dr Chang made her comments after reporting on the results of a study involving more than 1,600 fundus photos during the American Academy of Ophthalmology’s 2018 annual meeting in Chicago.
The bottom line, she noted, is that “the algorithm did not work as well as we thought it might. It was poor in identifying referable glaucoma.”
The images assessed were gleaned from the Baltimore Eye Survey, a population-based prevalence study of eye disease performed in East Baltimore from 1985-1988. Investigators developed the algorithm and used it on training, validation, and external sets. Among the results:
- 9 in every 10 images were graded as good or adequate in quality.
- The algorithm achieved sensitivity of 86% and specificity of 82% in the training set.
- In the validation and external sets, these percentages were 65%/94% and 80%/61%, respectively.
“I think we have a long way to go in terms of [using deep learning] for glaucoma screening,” explained Dr Chang. The key will be to “develop the algorithm in the setting that we are intending to apply it to.” She added that, unlike diabetic retinopathy, where the diagnosis is made primarily with images, deep learning for diagnosing glaucoma is best combined with other modalities such as visual field testing.
To put into context the work that’s left to be done before this type of modality is relevant in practice, Dr. Chang pointed to a study published in 2018 (Li Z, et al). Even though the analysis reported excellent area under the receiver operating characteristics, said Dr. Change, “when you apply the results in a community setting, the positive predictive value is only 64%. We would have to screen more than 1,000 people in a community to identify one patient with glaucoma.”
So, what’s the key to achieving value in the clinical setting? Besides developing the algorithm in the community where is will be used, Dr. Chang said that the definition of glaucoma needs to be clear at the outset, and image quality needs to be improved, which can be challenging in a clinical setting.
We recently reported on another study evaluating a deep learning algorithm to help identify patients with glaucoma, which you can read here.
Li Z, He Y, Keel S, Meng W, Chang R, He M. Efficacy of a deep learning system for detecting glaucomatous optic neuropathy based on color fundus photographs. Ophthalmology. 2018;125 (8):1199-1206. doi: https://doi.org/10.1016/j.ophtha.2018.01.023.
Chang D, Win S, Friedman D, Boland M. A deep learning algorithm for detecting referable glaucoma in a community eye survey. Talk presented at: AAO 2018 annual meeting; October 26-30, 2018; Chicago.