New AI model can predict what diseases you’ll have in 20 years
This technology could be used to promote timely lifestyle changes or to design health policies, but experts warn of its potential misuse by insurers or banks

In the world of preventive and personalized medicine, the ability to foresee, well in advance — even decades ahead — the likelihood of a disease emerging is key. This allows interventions to halt the processes that would cause illness long before any harm occurs. That is precisely what artificial intelligence seems capable of achieving, according to a study published on Thursday in Nature.
Throughout life, we spend most days in good health, occasionally fall ill, and gradually begin to develop chronic conditions. These patterns vary for each individual, depending on genetics, lifestyle, and socioeconomic status. Understanding a person’s health and assessing their future risks requires more than the isolated diagnoses received over their lifetime; it necessitates a comprehension of each individual’s health trajectory, including the interplay of past diseases, to guide targeted lifestyle changes or recommend diagnostic tests that monitor conditions most likely to occur.
Today, a group of researchers from the European Bioinformatics Institute, the German Cancer Research Center (DKFZ), and several Danish institutions propose applying the same technology behind large language models — like ChatGPT — to learn and predict the natural history of more than a thousand diseases simultaneously. The resulting model, named Delphi-2M, can identify disease patterns from medical histories, lifestyle factors, and preexisting health conditions.
“The most unexpected finding was that the model can predict more than 1,000 diseases. We would have expected it to work for some, but fail for many others. This shows how interconnected many diseases are and highlights the need to investigate the underlying mechanisms that connect them,” Moritz Gerstung, director of the DKFZ’s Division of Artificial Intelligence in Oncology and co-author of the study, explains of the results.
The algorithm was trained on data from 400,000 individuals in the United Kingdom and validated using records from nearly two million patients in Denmark. It can project health trajectories — both at the population and individual level — for up to two decades.
As with weather forecasts, the model does not provide certainties, only probabilities. Rather than predicting exactly what will happen to a specific individual at a given moment, it calculates the likelihood of developing certain diseases over a defined period. Short-term predictions are more reliable than long-term forecasts. For example, when estimating whether someone will have a heart attack in the next 10 years, the model is correct in roughly seven out of 10 cases. When the time frame extends to two decades, accuracy drops to 14%, slightly above the 12% achieved using only age and sex.
Regarding heart attacks, the model predicts that men aged 60 to 65 in the UK Biobank cohort have an annual risk ranging from 4 in 10,000 to 1 in 100, depending on their medical history and lifestyle habits. For women, the average risk is lower, though the range of probabilities is similar. Most importantly, when comparing the model’s predictions with real-world Biobank data not used in training, the calculated risks aligned closely with observed incidence across different age and sex groups, demonstrating that the estimates accurately reflect actual population trends.
Delphi-2M achieves accuracy comparable to the best disease-specific models for conditions such as dementia or myocardial infarction and outperforms existing mortality prediction algorithms. Only in the case of diabetes does a blood marker — glycated hemoglobin (HbA1c) — remain more reliable. The study also identified diseases that increase the risk of others, such as mental disorders or certain female reproductive system tumors.
Regarding the concern that knowing about potential diseases decades in advance could turn everyone into preventive patients, Gerstung emphasizes the need for further studies to determine how such knowledge could benefit patients. This would require testing potential AI applications in medicine through randomized clinical trials, where one group receives AI-assisted consultations and another does not. After a follow-up period, researchers would evaluate whether the AI-assisted group experienced greater benefits compared to traditional care. “This could also include subjective assessments of people’s well-being to evaluate the emotional effects of knowing or not knowing their risks,” he adds.
On potential risks associated with such powerful health prediction tools — such as discrimination by insurers against high-risk individuals — Guillermo Lazcoz, a member of the Ethics Committee of the Carlos III Health Institute, notes that AI applied to large health databases adds “another layer of risk” to existing concerns, such as these data ending up in a bank’s hands to assess creditworthiness based on a person’s likelihood of developing cancer or a heart attack.
“AI can identify a person from data that was supposed to be anonymous, which requires new protection measures,” Lazcoz continues. To implement these measures, “in Europe, secure data processing spaces are being developed, where data does not travel and access to third parties is limited in time and purpose,” he explains. Lazcoz also warns that there is a significant difference between organizations like the UK Biobank, used in the study published today by Nature, which has strict controls, and companies like 23andMe, where users can analyze their DNA to learn about their ancestry, as the latter has already been involved in scandals due to data protection issues.
Large language models like GPT-4 or Gemini learn language as a sequence of words, predicting the next word based on context. Researchers saw an analogy in health: a person’s medical history can similarly be viewed as a sequence of events — diagnoses, risk factors, lifestyle habits — that follow a temporal order to enable predictions.
For now, the model still needs refinement to be directly useful in patient care, but it is already a useful tool for better understanding how diseases develop and progress, and for evaluating how lifestyle and past illnesses influence future disease risk.
One of the most innovative aspects of the work is Delphi’s ability to generate synthetic health data. Using partial information, the model can imagine complete trajectories that maintain the same statistical properties as real data, but without corresponding to any particular individual. This protects patient privacy, as the data cannot be linked to real individuals, and allows other AI models to be trained without needing access to sensitive clinical data. For example, it could predict the potential health impact if obesity rates increased by 5% in a population.
There are already algorithms that predict the risk of certain diseases, such as heart disease or breast cancer, but this approach doesn’t address the true complexity of human health, where multiple diseases often coexist and influence each other. In increasingly aging societies, the ability to anticipate the burden of multiple diseases and design preventive policies and investments will be critical.
Sign up for our weekly newsletter to get more English-language news coverage from EL PAÍS USA Edition
Tu suscripción se está usando en otro dispositivo
¿Quieres añadir otro usuario a tu suscripción?
Si continúas leyendo en este dispositivo, no se podrá leer en el otro.
FlechaTu suscripción se está usando en otro dispositivo y solo puedes acceder a EL PAÍS desde un dispositivo a la vez.
Si quieres compartir tu cuenta, cambia tu suscripción a la modalidad Premium, así podrás añadir otro usuario. Cada uno accederá con su propia cuenta de email, lo que os permitirá personalizar vuestra experiencia en EL PAÍS.
¿Tienes una suscripción de empresa? Accede aquí para contratar más cuentas.
En el caso de no saber quién está usando tu cuenta, te recomendamos cambiar tu contraseña aquí.
Si decides continuar compartiendo tu cuenta, este mensaje se mostrará en tu dispositivo y en el de la otra persona que está usando tu cuenta de forma indefinida, afectando a tu experiencia de lectura. Puedes consultar aquí los términos y condiciones de la suscripción digital.











































