AI-assisted glaucoma screening offers promise in glaucoma detection but requires further refinement
OpenAI’s GPT-4V demonstrates potential as a tool for glaucoma screening and detection through fundus image analysis, showing strong consistency in certain diagnostic features and moderate agreement with expert evaluations, according to a study. However, its performance varies across datasets and is slightly less accurate than human experts, highlighting room for improvement in its diagnostic capabilities.
Using 300 fundus images from 3 public datasets—ACRIMA, ORIGA, and RIM-One v3—the study analyzed GPT-4V’s ability to evaluate image quality, cup-to-disc ratio, rim thinning, and other glaucoma indicators. Each image underwent preprocessing to focus on the optic disc, and GPT-4V analyzed the images twice to assess consistency. Two expert graders independently reviewed the same images for comparison.
GPT-4V accurately analyzed all 300 images, though 35% required multiple prompts. Its overall accuracy in glaucoma detection was slightly lower than the experts, with scores ranging from 0.68 to 0.81 across datasets, compared to expert accuracies of 0.72 to 0.88. Agreement with experts varied by dataset, with Cohen kappa values between 0.08 and 0.72. However, GPT-4V showed high consistency in assessing image gradability (≥89%) and strong agreement in cup-to-disc ratio and rim thinning evaluations.
The study concluded that GPT-4V demonstrates potential as a tool for glaucoma screening, offering significant agreement with expert evaluations, though its performance varies depending on the dataset and specific diagnostic features.
Reference
Jalili J, Jiravarnsirikul A, Bowd C, et al. Glaucoma Detection and Feature Identification via GPT-4V Fundus Image Analysis. Ophthalmol Sci. 2024;5(2):100667. doi: 10.1016/j.xops.2024.100667. PMID: 39877464; PMCID: PMC11773068.