ChatGPT AI passes the medical licensing examination in the US

Written by Paolo Rossi Castelli | 23 Feb 2023

The algorithm of ChatGPT answered 350 questions normally used in tests for students to qualify for the medical profession. Some hospitals are already using it to rewrite even the most complex reports.

How does ChatGPT work in practice?

The ChatGPT AI programme launched last November by the non-profit organisation OpenAI, which quickly became famous for its ability to answer a wide variety of questions in a natural way with excellent language (and writing) skills, continues to amaze, even in a specialist field like medicine.

Chat GPT uses a Transformer-type neural network to process and generate text independently. This lets it answer users' questions and interact with them through a chat, using natural language and trying to provide consistent and proper answers to their questions.

Can Chat GPT pass a medical examination?

When it was asked hundreds of questions in the usual tests that US universities use to grant or withhold a licence for practising medicine, ChatGPT proved that it was more than capable of passing the exam.

The experiment was carried out by a group of researchers from several American universities, who subjected the AI programme to the standard examination, known as the United States Medical Licensing Examination (USMLE) without any specialised training or tutoring. The exam has three papers (step 1, 2CK, and 3) with questions on most medical subjects, from biochemistry to diagnostic reasoning and bioethics.

Before the test, the researchers screened the questions and eliminated those based on the interpreting radiological and other images (which requires special training). 350 questions remained (out of 376) in the tests conducted in the United States up to the spring of 2022.

The result obtained by ChatGPT, as the researchers explain in the scientific journal PLoS Digital Health, was extremely valuable, because ChatGPT got between 52.4% and 75% of correct answers. Given that the pass mark is around 60%, the chances of the AI system succeeding would have been high. Moreover, ChatGPT also had an extremely high percentile of consistent answers - 94.6%. It did not provide any answers that contradicted each other and offered explanations that were not banal or obvious, demonstrating a sort of intuition, in 88.9% of its answers.

It is noteworthy that a similar system created by the world's largest repository of scientific publications, PubMed (PubMedGPT) hardly ever got more than 50% of right answers in the same USMLE, so was far behind ChatGPT.

Artificial intelligence to help students

“This artificial intelligence system," the researchers write, "has also shown great potential for helping students to train and prepare and they can therefore use it to study for tests and improve their knowledge. In some US hospitals ChatGPT is also already being used to rewrite more complex reports so that patients have a better understanding of the outcome of a test, diagnosis or treatment.

The authors of the study published in PLoS Digital Health, also decided to use ChatGPT to rewrite the first draft of their article, evaluate the logical coherence of what they had written and formulate points that were less clear differently. This is something that is increasingly common when writing scientific papers, but not all editors agree with it.

It doesn’t go online

Finally, it’s worth pointing out that, unlike other 'chatbots' (computer programmes that use AI to simulate conversation with users), ChatGPT can’t surf the Internet for research. All the answers are generated by an internal algorithm, which draws on a huge store of system information and is programmed to predict the probability of a given sequence of words based on the context of the words that precede it.

View full post