AI’s black box problem: Why is it still indecipherable to researchers
Some artificial intelligence algorithms, particularly neural networks, have internal processes that remain largely unknown. This lack of transparency is one of the sources of biases and hallucinations
When a neural network is operational, even the most specialized researchers are often left in the dark about what’s happening. This isn’t a discussion about biology; rather, it pertains to artificial intelligence algorithms — specifically those based on deep learning, which mimic the connections between neurons. These systems operate as a black box, remaining indecipherable even to data scientists, the brightest minds in academia, and engineers at OpenAI and Google, who have recently been awarded the Nobel Prize.
While the mathematics underpinning these algorithms is well understood, the behavior generated by the network is not. “Although we know what data enters the model and what the output is — that is, the result or prediction — we cannot clearly explain how this output was reached,” explains Verónica Bolón Canedo, an AI researcher at the Information and Communications Technology Research Center of the University of Coruña in Spain.
This is true for ChatGPT, Google Gemini, Claude (the model from the startup Anthropic), Llama (from Meta), and any image generators like DALL-E. It also applies to any system that relies on neural networks, including facial recognition applications and content recommendation engines.
In contrast, other artificial intelligence algorithms, such as decision trees or linear regression — commonly used in fields like medicine and economics — are more interpretable. “Their decision-making processes can be easily understood and visualized. You can follow the branches of the tree to see exactly how a particular result was reached,” explains Bolón.
This clarity is crucial, as it injects transparency into the process and provides assurances to users of the algorithm. Notably, the EU AI Act emphasizes the importance of having transparent and explainable systems. Unfortunately, the very architecture of neural networks hampers this transparency. To comprehend the black box of these algorithms, one must visualize a network of interconnected neurons or nodes.
“When you input data into the network, you initiate a series of calculations using the values present in the nodes,” explains Juan Antonio, a research professor at the Artificial Intelligence Research Institute at Spain’s National Research Center (CSIC). The information enters the first nodes and then spreads, traveling as numerical values to subsequent nodes, which in turn relay it to the next ones. “Each node calculates a number and sends it to all its connections, taking into account the weight [the numerical value] of each connection. The new nodes that receive this information then calculate another number,” adds the researcher.
It’s important to note that current deep learning models consist of thousands or even millions of parameters. These parameters represent the number of nodes and connections within the network after it has been trained, indicating all the values that can influence the outcome of a query. “In deep neural networks, there are many elements that are multiplied and combined. You have to imagine this in terms of millions of elements. It’s impossible to derive a meaningful equation from it,” says Bolón. The variability is exceedingly high.
Some industry sources estimate that GPT-4 has nearly 1.8 trillion parameters. According to this analysis, each language model would utilize about 220 billion parameters. This means there are 220,000,000,000 variables that can affect the algorithm’s response every time a question is posed.
The effort to detect biases and other problems
Due to the opacity of these systems, correcting biases becomes increasingly challenging. This lack of transparency fosters distrust, particularly in sensitive areas such as healthcare and justice. “If I understand how the network operates, I can analyze it and predict potential errors or issues. It’s a matter of security,” warns Rodríguez Aguilar. “I want to know when it works well and why — and when it doesn’t work well and why.”
Major players in AI are aware of this limitation and are actively working on initiatives to gain a better understanding of how their models function. OpenAI’s approach involves using one neural network to observe and analyze the mechanisms of another neural network. Meanwhile, Anthropic — another leading startup formed by former OpenAI founders — is investigating the connections that form between nodes and the circuit generated during information propagation. Both companies seek to examine elements smaller than nodes, such as the activation patterns of these nodes and their connections, to analyze network behavior. They aim to work with the simplest components first, intending to scale up their findings, though this is no easy task.
“Both OpenAI and Anthropic are attempting to explain much smaller networks. OpenAI is focusing on understanding the neurons of GPT-2 because the GPT-4 network is too large. They need to start with something smaller,” explains Aguilar.
Decoding this black box would yield significant benefits. In language models — currently the most popular algorithms — such understanding could prevent incorrect reasoning and reduce the infamous hallucinations. “One problem that could potentially be solved is that systems often give inconsistent answers. Currently, the process is quite empirical. Since we don’t know how to interpret the network, the most exhaustive training is performed, and if that training succeeds and the tests are passed, the product is launched,” Rodríguez Aguilar explains. However, this process does not always produce desirable outcomes, as evidenced by the initial launch of Google Gemini, which mistakenly generated images of Nazis with Asian features and Black images of Vikings.
This lack of transparency regarding how algorithms operate aligns with legislative goals. “The European AI Act requires that developers provide clear and understandable explanations of how AI systems work, especially in high-risk applications,” says Bolón. However, he clarifies that these systems can still be utilized as long as users receive sufficient explanations regarding the basis for the decisions made by the system.
Rodríguez Aguilar agrees that there are tools available to elucidate the results of an algorithm, even if the exact workings of the process remain unclear. “However, what concerns me most — beyond explainability and transparency — is robustness; that these systems are safe. What we are looking to do is to identify circuits within the network that may pose safety risks and lead to unsafe behavior.”
The ultimate objective is to maintain control over AI, particularly when it is deployed in sensitive areas. “If you are going to implement an AI that suggests treatments in a hospital, drives an autonomous vehicle, or provides financial recommendations, you must be certain that it functions correctly.” This underscores researchers’ intense focus on understanding what occurs within the inner workings of an algorithm; it’s not just mere scientific curiosity.
Sign up for our weekly newsletter to get more English-language news coverage from EL PAÍS USA Edition