Can AI outperform ophthalmologists in diagnosing glaucoma?
Although AI models like GPT-4o show promise as supplementary tools in diagnosing complex glaucoma cases, they currently lack the accuracy needed to replace human ophthalmologists, particularly for primary diagnoses, according to a study.
In a prospective observational study conducted at a tertiary care ophthalmology center, researchers evaluated GPT-4o’s diagnostic performance against 3 ophthalmologists with varying experience levels. The study analyzed 26 cases of primary and secondary glaucoma using publicly available databases and institutional records.
The GPT-4o underperformed compared to human ophthalmologists in primary diagnosis. It achieved a mean accuracy score of 5.500, significantly lower than the top-performing doctor, who scored 8.038. Completeness scores for GPT-4o were also lower than those of all participating doctors.
However, GPT-4o demonstrated comparable accuracy in differential diagnoses, scoring 7.577 against human scores of 7.615 and 7.673. GPT-4o also surpassed all participants in differential diagnosis completeness, achieving a score of 4.096.
These findings underscore the potential of GPT-4o’s as a supplementary tool for complex cases, but suggest it is not yet suitable as a standalone diagnostic method.
Reference
Zhang J, Ma Y, Zhang R, et al. A comparative study of GPT-4o and human ophthalmologists in glaucoma diagnosis. Sci Rep. 2024;14(1):30385. doi: 10.1038/s41598-024-80917-x. PMID: 39639068.