AI surpasses expectations in its ability to assist doctors in specific areas of health care

The Google Research team behind Med-PaLM doesn’t envision autonomous use or replacing doctors

Daniel Mediavilla

Jul 14, 2023 - 01:17CEST

In Woody Allen’s movie Deconstructing Harry, he brilliantly captures the thinking of people convinced they’re sick. “The most beautiful words in the English language are not ‘I love you,’ but ‘It’s benign.’” But our incessant Googling for self-diagnosis can cause even more anxiety, placing an additional burden on physicians. Now, imagine a world where this same company, already an indispensable source for car navigation and dinner reservations, becomes your go-to place for answers to life’s existential questions. By harnessing the power of artificial intelligence (AI), Google can now provide precise answers to your medical queries.

In a recent article published in Nature, a Google team presented Med-PaLM, a generative AI model similar to ChatGPT, which is trained on large databases and can organize information to provide meaningful (but sometimes inaccurate) responses. According to the study, version two of Med-PaLM achieved an 86.5% accuracy rate in multiple-choice exams similar to those taken by doctors for board certification, a 19% improvement over version one.

In the Nature study, the authors, primarily members of Google Research, conducted extensive testing of their models on large databases of medical questions and answers. These databases contained over 3,000 of the most frequently searched questions on the internet. One of the authors, Shek Azizi, highlighted that the system progressed from a modest level of expertise to an expert level within a span of three months. The models were evaluated based on their ability to provide accurate responses to the questions. A group of independent medical experts estimated that 92.9% of the comprehensive answers generated by Med-PaLM aligned with the scientific consensus, slightly surpassing the 92.6% agreement achieved by human doctors. The study also revealed that Med-PaLM generated a 5.8% rate of answers that may potentially lead to harmful outcomes, which was marginally lower than the 6.5% rate observed in responses provided by doctors. While these findings are indeed promising, the authors emphasize the need for further research before implementing these models in clinical care settings. According to Azizi, the system is not intended for autonomous use or to replace doctors.

Josep Munuera, Director of Radiology at the Hospital de la Santa Creu i Sant Pau in Barcelona and an expert in healthcare technology, believes that these models can be useful, but warns that “doctors do more than just answer questions” like the ones given to the AI system. “Thorough exploration and attentive observation of nonverbal cues are imperative in order to provide an accurate diagnosis,” he said. After the in-person consultation and examination, technologies such as Google’s can save time and effort by generating a comprehensive patient report or therapeutic plan. “It can also be valuable as a means of supporting the generation of diagnostic ideas and facilitating searches of scientific information in extensive databases. Nonetheless, it’s absolutely necessary to have humans verify the AI’s suggestions and assume responsibility for the final decision,” said Munuera. “The doctor’s role has various facets, with profound reliance on human interaction. Our objective is to leverage AI to augment a doctor’s capacity to deliver improved treatment,” said Azizi.

In an interview with EL PAÍS, Regina Barzilai, an MIT scientist and expert in AI applications in medicine, says that these systems learn on their own from patterns provided to them, and can outperform humans in some skills. “However, our capacity to detect errors in them is limited. We must adapt to a world where technology autonomously makes decisions beyond our direct oversight,” said Barzilai. When using ChatGPT, we’ve seen how these systems can produce believable answers that sometimes contain inaccuracies. This can be challenging when the responses are well-expressed and harder to identify. In healthcare, where accuracy is crucial, not knowing the source of the answers produces uncertainty and doubt.

In certain applications that do not involve diagnosing patient diseases but instead focus on knowledge discovery, the artificially generated elements known as “hallucinations” in AI-generated texts may not pose an issue. “Hallucinations and creativity are two sides of the same coin, particularly in applications like drug repurposing and gene-disease association. Such endeavors require a certain level of creativity, which in turn facilitates the process of innovation and discovery,” said Azizi.

José Ibeas, a nephrologist at Parc Taulí Hospital in Sabadell (near Barcelona) and secretary for the Big Data and Artificial Intelligence Group of the Spanish Society of Nephrology, believes that this technology is the future and will greatly enhance medical treatment. However, he acknowledges that there is still much to learn. “For instance, while information may be gathered from reputable sources, not all studies hold equal weight and there is a frequent scarcity of studies with negative data or failed experiments. AI generates text based on these sources, but the exact selection of content from each article remains unknown, potentially leading to biases,” said Ibeas. “A treatment may prove beneficial for a population group experiencing a specific disease in a particular environment, and fail to yield the same results for another population group.”

Ibeas believes these AI models can be a valuable resource for doctors, but it’s crucial to establish their efficacy through rigorous verification processes, similar to the regulatory processes governing approval of other medical products. “This can be achieved by comparing outcomes between doctors who regularly leverage this technology,” he said. Ibeas also stressed the importance of applying this technology cautiously and wants to see comprehensive training of healthcare professionals. He also would like to limit their use to cases where the benefits are undeniable. “This approach will help avoid the common pitfall of bowing to commercial pressures, which could produce mistakes that force people to abandon a genuinely beneficial medical technology.”

An important aspect of generative language models is their ability to provide high-quality health information to people who previously lacked access. The authors of the Nature study say they compared the Med-PaLM results with highly skilled experts and found that the AI system performed well. This led some doctors to express concern that AI systems could lead to more resource cuts, while acknowledging the usefulness of models like Med-PaLM in overextended health care systems.

Sign up for our weekly newsletter to get more English-language news coverage from EL PAÍS USA Edition

Tu suscripción se está usando en otro dispositivo

¿Quieres añadir otro usuario a tu suscripción?

Añadir usuario Continuar leyendo aquí

Si continúas leyendo en este dispositivo, no se podrá leer en el otro.

¿Por qué estás viendo esto?

Flecha

Tu suscripción se está usando en otro dispositivo y solo puedes acceder a EL PAÍS desde un dispositivo a la vez.

Si quieres compartir tu cuenta, cambia tu suscripción a la modalidad Premium, así podrás añadir otro usuario. Cada uno accederá con su propia cuenta de email, lo que os permitirá personalizar vuestra experiencia en EL PAÍS.

¿Tienes una suscripción de empresa? Accede aquí para contratar más cuentas.

En el caso de no saber quién está usando tu cuenta, te recomendamos cambiar tu contraseña aquí.

Si decides continuar compartiendo tu cuenta, este mensaje se mostrará en tu dispositivo y en el de la otra persona que está usando tu cuenta de forma indefinida, afectando a tu experiencia de lectura. Puedes consultar aquí los términos y condiciones de la suscripción digital.