Author summary Large language models (LLMs) are the most sophisticated form of language-based artificial intelligence. LLMs have the potential to improve healthcare, and experiments and trials are ongoing to explore potential avenues for LLMs to improve patient care. Here, we test state-of-the-art LLMs on challenging questions used to assess the aptitude of eye doctors (ophthalmologists) in the United Kingdom before they can be deemed fully qualified. We compare the performance of these LLMs to fully trained ophthalmologists as well as doctors in training to gauge the aptitude of the LLMs for providing advice to patients about eye health. One of the LLMs, GPT-4, exhibits favourable performance when compared with fully qualified and training ophthalmologists; and comparisons with its predecessor model, GPT-3.5, indicate that this superior performance is due to improved accuracy and relevance of model responses. LLMs are approaching expert-level ophthalmological knowledge and reasoning, and may be useful for providing eye-related advice where access to healthcare professionals is limited. Further research is required to explore potential avenues of clinical deployment.
Using it the real world in conjunction with a doctor will be interesting