Chile searches for those missing from Pinochet dictatorship with the help of artificial intelligence

At the end of August, Chilean president Gabriel Boric launched the Search Plan for more than 1,000 Chileans. Today, old judicial documents, many typewritten, have been digitized to apply cutting edge technology and cross-reference data

Relatives of disappeared detainees walk to the General Cemetery on the 50th anniversary of the coup d'état in Chile.
Relatives of disappeared detainees walk to the General Cemetery on the 50th anniversary of the coup d'état in Chile.Sofía Yanjarí
Ana María Sanhueza

On Monday 15 January, at the inauguration of the “Congress of the Future” in Santiago, President Gabriel Boric stated that artificial intelligence, the theme of the 13th version of the conference, “will play an important role in the search for our missing detainees.” He was referring to the Search Plan to find over 1,000 individuals who were victims of the Augusto Pinochet dictatorship (1973-1990), which his Administration presented on August 30, 2023, on the eve of the September 11 commemoration of the 50th anniversary of the coup d’état that ousted Salvador Allende, the socialist president.

The plan, spearheaded by Justice Minister Luis Cordero, is an initiative that is intended to become a permanent State policy. According to Justice data, after the dictatorship in Chile there were 1,469 victims of forced disappearance and of these, 1,092 are missing detainees, while 377, who were executed, are missing as well. So far only 307 have been identified.

To embark on this new search, which has already been initiated by the courts, Cordero tells EL PAÍS that he is working with two main sources. On the one hand, the judicial investigations, which comprise millions of pages. And on the other, the administrative records of the cases that are scattered around state agencies. These include the Human Rights Program, created in 1997, which falls under the Ministry of Justice, as well as previous investigations in military Prosecutor’s Offices (which used to close the cases) and the files that provided the basis for the 1991 National Commission for Truth and Reconciliation Report, driven by the former president, Patricio Aylwin (1990-1994), and in which an account of the victims was given for the first time.

Typewritten documents

Unsholster, a company specialized in data analysis, data science and software development, whose general manager is the civil engineer Antonio Díaz-Araujo, is behind the technological analysis of the information. The Human Rights Program has already digitalized the information, while the Judicial Branch is 80% digitalized. The firm was awarded the project in a bidding process in the context of the Search Plan — it is in charge of the implementation of artificial intelligence.

Something of relevance in this investigation is that the judicial files, separated according to each case, were processed in the old Chilean justice system (changed in 2005), which implies that the judges’ inquiries are on paper — most of them have the pages sewn into a notebook by hand, written on typewriters, and there are even several handwritten parts. These are the ones containing statements, black and white photographs, photocopies of photos, forensic reports and old police reports.

However, in addition, the judicial inquiries that have been undertaken since 2000 will provide a more up-to-date and crucial basis of information in the analysis. Since then, hundreds of cases that had been shelved during the dictatorship have been reopened by judges with exclusive dedication to cases of human rights violations with sentences.

Cordero points out that “there is a lot of information in the hands of the State and there is no human capacity to process it, because it needs to be interconnected. For example, there are testimonies that appear in some files and not in others. And, in addition, depending on the judges, there were lines of investigation, so there may have been precedents that were useful for some and not for others.” For this reason, the justice minister says artificial intelligence can play a key role, as he believes that in these cases, the cross-referencing of information will be crucial.

“All that information is in judicial and administrative files, and what digitization accomplishes first is to integrate them in one place. And then to work with artificial intelligence, which allows us to reduce the investigation gaps using algorithms, which are being tested, and which can read, for example, dates, names, places, for instance, in those files,” the minister adds.

4.7 million pages and counting

Unsholster is currently in the pre-project stage, before it starts programming, Díaz-Araujo explains to EL PAÍS. “But we have already touched on most of the file types that we will need to deal with,” he says. The documents that have been coming in, scanned sheet by sheet, are in folders, in PDF format, and therefore do not correspond to a logic that allows data to be searched because they are recorded as images. For this reason, the first step has been to start applying OCR (Optical Character Recognition) technology so that they can be transformed into data.

They already have information — which does not yet include the thousands of files of the Judicial Branch — totaling 46,768 PDF files, which amounts to more than 4.7 million pages. “If a person were to read every one of those pages, out loud and without understanding or relating facts, they would probably spend eight hours a day reading for 27 years,” explains the civil engineer.

Once those files are moved to pages, Díaz-Araujo says, “a big classification tree is created, which allows you to classify pages that have images, manuscripts, typewritten pages, or Word-style files. And then you start to apply, on each one of them, the best OCR” for each type of page, because the key, he adds, lies in “what material is brought to each one.”

Another stage, he explains, is to create different types of dictionaries and entities “that can be learned with use.” For example, nicknames of people, places, streets (many have changed names since the dictatorship), ways of writing and dates.

This implies, he says, creating a topology of entities in the reading, using technology, of each of the texts “that is capable of rapidly correlating different pages, people, places and dates in a highly flexible way.” He gives an example: “Many of the offenders may have nicknames, and several of them may be written in different ways, but that doesn’t mean that they won’t be linked. What you do is create technology that is capable of suggesting other correlations to the analyst as they occur over time.”

Therefore, he elaborates, “there is artificial intelligence in the classification of documents; there is high intelligence in transforming documents from an image to searchable data and then, there is a lot of it, in the creation of entities that enable the connection of some documents with others. And, finally, the most necessary thing in a platform is that it should be about the possibility of competing algorithms, with artificial intelligence or without, on this data. But it should not be bound to a technology, because the biggest issue is being open to new technologies of the future. If you keep it closed, it becomes a stumbling block.”

He continues: “Another key point of this platform is that the original data, and the transformed data, are retained. But you can continue to create other data on top of that. There is no time machine that kind of freezes the ability to produce more algorithms and more information with new platforms in the future.”

Contreras and Krassnoff

Five months after technology was first applied to the nearly 47,000 documents of Unsholster’s Human Rights Program, it is already possible, thanks to the implementation of the initial OCR on the identification documents, to find thousands of mentions of at least four military officers who were part of Pinochet’s secret police, the feared DINA (National Intelligence Directorate).

Manuel Contreras, its director general, sentenced at the time of his death in 2015 to 526 years in prison for hundreds of crimes, appears 2,800 times; Pedro Espinoza and Miguel Krassnoff, both serving sentences in Punta Peuco prison, 2,079 and 2,954 mentions, respectively. And Marcelo Moren Brito, who was the torturer of Ángela Jeria, the mother of former socialist president, Michelle Bachelet, 2,284 times.

For now they are only mentions. But from now on, names, facts, dates and places can be linked and related, says Díaz-Araujo.

Sign up for our weekly newsletter to get more English-language news coverage from EL PAÍS USA Edition

More information

Archived In

Recomendaciones EL PAÍS
Recomendaciones EL PAÍS