Variations in specificity, safety amongst AI algorithms for diabetic retinopathy screening delay clinical use
Artificial intelligence diabetic retinopathy screening systems (AIDRSS) have a high negative predictive value and perform positively overall, but large disparities in safety and specificity amongst systems, warrant additional validation studies, according to a presentation at the 2020 ASRS Virtual Meeting.
In this multicenter study, 311,604 retinal images from 23,724 patients from 2 Veterans Affairs hospitals were graded by 6 algorithms (labelled A-F) and categorized into stages 1-5 ranging from no diabetic retinopathy to proliferative diabetic retinopathy, or ungradable image quality.
For each AIDRSS, sensitivity ranged from 51.0%-85.9%, specificity from 58.9%- 83.7%, negative predictive value from 84.2%-93.7%, and positive predictive value from 33.1%-50.8%.
Patients with nonproliferative diabetic retinopathy or worse were under-referred with rates ranging from 0.6%-19.9%. Patients with proliferative diabetic retinopathy alone were also under-referred (range 0.9%-33.6%). Diabetic retinopathy was overcalled at rates ranging from 16.3%-41.1%.
Reference
Lee A, et al. Multicenter, head-to-head, real-world validation study of artificial intelligence diabetic retinopathy screening systems (AIDRSS). Presented at 2020 ASRS Virtual Meeting.