Yoshua Bengio, Turing Award winner: ‘There is empirical evidence of AI acting against our instructions’
The scientist, a pioneer in neural networks, warns that ‘the capabilities of artificial intelligence continue to advance at a rate which seems faster than risk management practices’

Yoshua Bengio, 64, is one of the pioneers in the development of deep learning, the most promising artificial intelligence (AI) technique today. His research in this field was recognized with the prestigious Turing Award — often described as the Nobel Prize of computer science — which he received in 2018 along with Yann LeCun and Geoffrey Hinton. A Canadian national and professor at the Université de Montreal, Bengio is world-renowned for the extraordinary level of scientific rigor in his research and is now a leading advocate for raising awareness of the risks involved in the development of AI. He chairs the International AI Safety Report, an annual study that seeks to compile scientific evidence on emerging AI risks to support decisions for managing them, and whose second edition is published this Tuesday.
In an interview with this newspaper — arranged on the occasion of the World Ecomic Forum in Davos, in which he participated, but conducted via videoconference on January 30 due to scheduling conflicts — Bengio warned that there is “empirical evidence and laboratory incidents where AI is acting against our instructions” and that “the capabilities of AI continue to advance at a rate which seems faster than risk management practices,” a concept that includes misuse of systems — fraud, mind manipulation —; dysfunction — errors, loss of control —; or systemic consequences, such as the impact on the labor market or on cognitive processes.
Question. What are the most worrying signs you identify when analyzing the evolution of current models and agents?
Answer. There are two elements in the report that I think reflect some of the most serious concerns many scientists have. Two things are happening at the same time. One is the continued advances in AI capabilities, more specifically in reasoning, that includes being able to strategize in order to achieve goals, which you can think is a good thing. That is, AIs are getting smarter. But at the same time, we’re seeing empirical evidence and laboratory incidents where AIs are acting against our instructions and sometimes with an apparent drive to preserve themselves and being willing to be deceptive in order to avoid our oversight, in order to avoid being replaced by a new version and things like that. So, these two things together are problematic.
Q. This points to the risk of a loss-of-control scenario. How would you characterize it? How serious is that risk?
A. The most important thing to understand is that it’s an emerging risk. It hasn’t happened. And it is one for which the probability that it will happen for real is very hard to estimate, and scientists in AI don’t agree about its probability. But what we can all agree on is that if it does happen, it could be catastrophic. It could be terrible. That’s why it deserves attention. What the report really focuses on is that we have those early signs, which still require improved human methodology and systematic conclusions. But these early signs, because of the severity of those risks, do mandate increased monitoring and research to understand why it’s happening and how it can be mitigated.
Q. Perhaps one of the most worrying risks is the misuse of these systems to try to manipulate people’s minds and, therefore, manipulate the proper development of democratic life. What is your opinion, and where do we stand on this issue?
A. What the report does is it documents the studies that are showing various aspects of this risk, which is already happening to some degree, but there’s also possibility that it could get worse. Let me elaborate on this. First, it’s very obvious that quality of the content that is generated by AI is getting more and more difficult to distinguish from real content, because the AI is getting better, whether it is images, texts, voice or video. Another area of concern is personalized persuasion and manipulation. There’s been a number of studies that show, in laboratory settings, that the frontier models — the most powerful ones — are at least as good as humans at persuasion. Persuasion means somebody will actually change their mind on a topic after several rounds of dialogue. There’s a real risk of a very large-scale influence on public opinion, because such bots could be multiplied, and you can have millions of these things. And this calls for some kind of intervention in order to mitigate those risks.
Q. These machines can create a pattern where people give up the effort of solving problems, which is one of the foundations of intelligence and its development. How concerned are you about the effect on human cognitive abilities?
A. That is a difficult question. I don’t think that we have very clear answers from science at this point. I suspect that it’s going to depend on context. It might be bad for some people and for some types of users or some type of relationship between the person and the AI, and it might actually be good for some others. So, the good example would be to think of the AI as an assistant. I use AI as a research assistant, and the assistant is not perfect. They can make mistakes, but they have much more time than you do, so they can read the literature and give you reports about relevant information that can be useful for you. Of course, this could become something unhealthy if the decisions are really taken by the AI, and you’re almost a passive actor in the decision-making process. So, that’s where it becomes dangerous for a number of reasons. What is kind of decline if we rely completely too much on the AI, and especially for children. The other issue is connected to the loss of control problem.
Q. How?
A. if we rely a lot on AIs that are not really trustworthy because they have their own intentions, that’s also dangerous. Another misaligned behavior that is creating problems is sycophancy: the AI is not telling you the truth, but what you want to hear. And that is already causing mental health problems, because people can have delusions, and the AI will go along with those delusions. It’s a little bit like in social media: you’re going to get your crazy ideas reinforced, the bubble effect. It’s something similar with AI, where the AI will go along with your crazy ideas or even amplify them in the feedback loop. That has given rise, at least anecdotally and via lawsuits, to episodes of psychosis and even tragic incidents where people harm themselves, even kill themselves.
Q. I would now like to ask you about your assessment of the work being done by large companies that develop these models and agents in relation to security measures. How do you assess that work?
A. If we look at the change from a year ago, there has been a lot of progress. Many more of the leading companies are implementing some form of explicit risk management. Several are working with the Frontier Model Forum to synchronize their risk management practices. I believe there are currently about 12 companies that have published their risk management processes. In addition, there’s been advances on the legislation side in Europe, as you know, but also in the United States and in China. We’re seeing a gradual convergence of these different safety requirements and public protection risk management processes, which is a good thing. The downside is that the capabilities of AI continues to advance at a rate which seems to be faster than the risk management practice.
Q. You mentioned regulation by public actors, such as states or the European Union. In Europe, we often hear the argument that there is over-regulation, and that this stifles innovation, while the United States and China are moving faster. How would you describe this situation and its consequences?
A. As chair of the report, I must point out that the mandate is very explicit that we’re not making policy recommendations. We focus on providing scientific evidence that can inform policy debate. But, if I change hats and speak as an independent university professor and as someone who was involved in the European effort, I can say that the requirements of the EU AI Act’s Code of Practice only apply to the largest models. In Europe, very few companies exceed that threshold. These requirements do not place an undue burden on companies of that size, and most large U.S. companies already accept them because, to a large extent, that is what they were already doing. They do, however, introduce a degree of transparency towards governments and thus the public, which many leading companies consider a good thing. In my view, the real reasons why Europe has been lagging has nothing to do with those regulations and has to do with lack of large-scale investment, being willing to take risks, and other factors that have to do with the structure of the European economy. Europe needs to wake up to these realities and invest massively in order to be, as Canadian Prime Minister Mark Carney said in his Davos speech, at the table rather than on the menu.
Q. Perhaps it would be ideal to have something similar to what already exists in the nuclear sector, with the Non-Proliferation Treaty and the International Atomic Energy Agency. But that doesn’t seem viable right now for AI. So, what can be done?
A. Yes. The problem is that the main perception of many countries and companies is that they are in a competition with other companies and other countries. And that isn’t a good starting point for any kind of international coordination treaty and so on. What the report hopefully helps to do is to establish the facts, the scientific evidence for countries to realize that they also have a lot to lose in common. Catastrophic misuse or loss of control would be bad whether you’re Chinese or American. None of us should accept those risks. It’s just that unfortunately the scientific understanding and the geopolitical understanding of the magnitude of those risks seems to be still weak. What I’m hoping, as we move forward with scientific evidence about these and other risks, is that governments will see it is in their self-interest to seek international coordination to mitigate the risks, similar to what happened, as you said, with nuclear weapons.
Q. At the beginning of the Divine Comedy, Dante Alighieri encounters a she-wolf representing lust for power and a lion representing pride. In the work, he argues that the she-wolf was the worst beast of all. In the context of AI, is it the wolf we should fear more, or the lion? Greed or arrogance?
A. I would add that there is scientific evidence for the things you’re talking about, not just poetry; scientists have also studied psychology and social psychology. And we tend to easily blind ourselves to reality. This is called motivated cognition. In other words, we will not even have the thoughts that would allow us to see reality as it is. Instead, consciously or unconsciously, we can be driven by greed and hubris. And it’s not necessarily because the person has bad intentions. In fact, they may sincerely think they are doing the right thing. But it’s not necessarily rational, nor is it even grounded in the values the person professes. That includes all of us, myself included. It’s human nature.
Sign up for our weekly newsletter to get more English-language news coverage from EL PAÍS USA Edition
Tu suscripción se está usando en otro dispositivo
¿Quieres añadir otro usuario a tu suscripción?
Si continúas leyendo en este dispositivo, no se podrá leer en el otro.
FlechaTu suscripción se está usando en otro dispositivo y solo puedes acceder a EL PAÍS desde un dispositivo a la vez.
Si quieres compartir tu cuenta, cambia tu suscripción a la modalidad Premium, así podrás añadir otro usuario. Cada uno accederá con su propia cuenta de email, lo que os permitirá personalizar vuestra experiencia en EL PAÍS.
¿Tienes una suscripción de empresa? Accede aquí para contratar más cuentas.
En el caso de no saber quién está usando tu cuenta, te recomendamos cambiar tu contraseña aquí.
Si decides continuar compartiendo tu cuenta, este mensaje se mostrará en tu dispositivo y en el de la otra persona que está usando tu cuenta de forma indefinida, afectando a tu experiencia de lectura. Puedes consultar aquí los términos y condiciones de la suscripción digital.









































