_
_
_
_

Milagros Miceli, researcher: ‘It’s not true that AI is going to automate everything. It requires the manual and precarious work of millions of people’

The expert from the German Internet Institute warns that the rise of tools such as ChatGPT will increase the demand for people who generate content for little money to enrich databases

Milagros Miceli sobre la IA
The sociologist and computer scientist, Milagros Miceli, pictured in Barcelona.Albert Garcia
Manuel G. Pascual

Artificial intelligence (AI) isn’t as automatic as it claims to be. This technology works thanks to powerful computers that run calculations based on extensive databases. But these databases have to be fixed and tested, a manual job that companies outsource to legions of workers, who are generally paid very little. Sometimes, they earn mere cents for each task performed. This reality was described in the book Ghost Work (2019), authored by Mary Gray — an anthropologist and researcher at Microsoft — and her colleague, Siddarth Suri.

By the time this book was published, Milagros Miceli — an Argentine sociologist who holds a doctorate in Computer Science — had already been researching the subject for two years. When she was hired at the newly created German Internet Institute — hosted by the Weizenbaum Institute, named after Joseph Weizenbaum, a pioneer in AI — the social consequences of algorithms were being approached from a very theoretical point of view. But Miceli wanted to go further. “I wondered if anyone was talking to the people behind these algorithms. That’s how I got to the data annotators, who tag images of chairs with the word ‘chair,’ so that the machine learns to distinguish them. And then, I got to know the data workers, a concept we developed ourselves,” she explains.

Since then, Miceli, 41, has pursued this topic. Today, she’s one of the leading experts in the little-known field of data workers. She’s also a senior researcher at the Distributed Artificial Intelligence Research Institute (DAIR), the center founded by Timnit Gebru. Gebru was Google’s head of AI ethics: she was fired shortly after putting her name on a report that didn’t make the company look good.

This past December, Miceli participated in the Thinking Global Digital Justice conference, held in Barcelona. She sat down with EL PAÍS to discuss her work.

Question. What is a data worker?

Answer. A data worker is a person who, basically, produces data to train AI systems. This task can be carried out by recording their own voice, uploading selfies, labelling and classifying data, interpreting it, etc. What many people sometimes don’t imagine is that this is an ongoing job. The [AI] systems require manual work, such as maintaining the data sets, checking them, or fine-tuning them.

Q. What kind of qualification is needed to carry out these tasks?

A. There’s a myth that the people who do this aren’t skilled workers. However, in practice, these [data] workers have at least completed secondary or higher education. I’ve even met people who have PhDs and are doing this type of work.

Q. Where are data workers located?

A. They’re found among vulnerable or poor populations, in areas with a very high unemployment rate. But they have higher education. The nature of the work is actually very difficult. I’ve tried it. It requires not only a level of formal knowledge, but also craftsmanship.

Milagros Miceli
Milagros Miceli, pictured before her participation in the Thinking Global Digital Justice conference Albert Garcia

Q. Could you give me an example?

A. Labelling and segmenting satellite images are very common tasks. This is extremely difficult. Firstly, it’s tiring for the eyes, as well as for the hand that’s constantly moving the mouse. On top of that, you have to be very careful to separate a tree from a person, or a house from a car, in images that are often blurry. This requires a certain level of knowledge about the architecture and vegetation that’s specific to a country. And, if you do it incorrectly, you don’t get paid at all.

Q. What’s the situation like for data workers?

A. It hasn’t changed since this [phenomenon] started. They’re in a situation of total precariousness, with a lack of protection. In this field, what was once described as the “uberization of work” is what rules. They’re paid for a task completed, not for the time it takes to do the task. In the case of data workers, the time it takes them to log in, to find an available task (which isn’t always available) or to understand the instructions — which can be very complex and are almost always in English — isn’t counted. And all of this involves the risk that the client will later say that the task wasn’t done as requested and withhold pay. This happens in many cases. On top of that, the client still has the right to keep the data that’s been delivered to them.

Some data workers are blocked on platforms for asking questions, for example, about salary. Then, some platforms — such as the largest of all, Amazon Mechanical Turk — don’t pay with money, but with vouchers, in this case, to spend on Amazon. This creates a perfect monopoly. Of course, when something happens to the worker — such as having after-effects from working with psychologically disturbing content — no one helps them. In many cases, they previously signed a confidentiality agreement that prohibits them from revealing the nature of what they do. Some workers have told us that, for this reason — and despite suffering from post-traumatic stress — they’ve avoided seeing therapists. Nor can they put on their CV that they’ve been content moderators for such a large platform.

Q. So, there are content moderators who aren’t on the payroll, but who come in via the freelance route?

A. Most of the content moderators have precarious contracts through third-party companies based in Europe, but many also operate in countries in the Global South. These companies, by the way, are the same ones that are used to label images. In fact, the same people often rotate from one team to the other. Content moderation is often done at the same time as data-labeling: they decide whether what they’re monitoring is hate speech or not, information that’s subsequently used to train the algorithms.

Q. Do we know how many data workers are out there?

A. It’s very difficult to give a number. The World Bank — a conservative institution — says that there are between 150 and 420 million in the world. What we do know is that the numbers have grown exponentially in recent years. This idea that work is going to be automated is a lie. AI requires a lot of manual work.

Q. The discourse of “uberization” says that data workers want to do work at specific times, as a means to supplement their income. This is the same argument that food delivery platforms make. Is this true? Are data workers full-time, or only part-time?

A. That’s another myth, yes. There are hardly any occasional data workers. This has to do with the complexity and high level of professionalism required — as I mentioned before — to carry out these tasks. The more sophisticated AI models become, the more qualified the workers who operate their databases have to be. It’s not like 10 years ago, when you were asked to identify kittens in a series of photos. That doesn’t exist anymore. To make any money doing this, you have to work every day.

Q. The classic example of data work, as you say, was image-tagging. What’s in demand now?

A. Seven years ago, when I started in this field, photo-tagging was the trend. The important thing was to have quantity, not quality. In 2019, we did a study in which we analyzed the instructions given to workers and most of them were along those lines. But recently, there’s been a very marked shift towards tasks that have more to do with linguistics and generative AI: producing data from scratch for a specific purpose. For example, unemployed artists are hired and asked to create images according to certain basic instructions, the so-called “prompts.” This is then given to the Midjourney algorithm to perfect its operation. Or, you hire journalists or writers to write features or short stories, so that the machine can extract patterns. You also record people reading texts in minority dialects or languages to enrich the databases.

Q. Can AI function without this manual work? Does it need human support 24 hours a day?

A. This system is designed to have workers available 24 hours a day, seven days a week, while paying them the bare minimum. And, if they don’t like the conditions, companies can move on to the next country or town. Large scale is what rules, and that only works if we have millions of workers. Of course, there’s another way of doing things. Models work better if they’re trained on smaller, but better curated data sets. For that, you don’t need millions of workers, but rather professionals and to communicate with them. The opposite of anonymity and algorithmic mediation.

Q. The latest generative AI models have already been trained with all the data available on the internet. So, the next-generation models will need to include all of that, along with new, artificially-produced data. Do you think generative AI will boost the demand for data workers?

A. If I had to make a prediction, it would be that the number of data workers will continue to grow. Even those who believe that the future lies in synthetic data — generated by machines — know that this is difficult from a technical point of view. Without going into too much detail, training an AI system with data generated by [another] AI system produces a loop: it ends up repeating the same thing, like an infinite game of mirrors. So, writers, artists, journalists, or translators will continue to be needed to generate data, which serves to enrich the databases on which the algorithms are applied.

Now, even if you could train models with synthetic data, you would still need data workers for algorithmic verification tasks. For example, this consists of sitting down with ChatGPT and asking it questions and saying whether what it answers is right or wrong, if there’s a better option, etc. Continuing with language is something dynamic, which is constantly changing. Chatbots need to be constantly perfected. And only humans can do that, because we know and understand the contexts.

Q. Why do you think so little is known about the manual dimension of AI?

A. It’s entirely intentional. They’re selling us the myth of a technology that’s miraculous and incredibly powerful, we should fear it because it could kill us all. I would add to that that this technology is based on [hidden] work, on precarious work, on the exploitation of millions of workers. But to sell this myth of ultra-powerful and fearsome technology, it’s necessary to erase all traces of humanity. However, AI wouldn’t be able to work without legions of manual laborers. Why continue hiding them and keeping them in precarious conditions?

Translated by Avik Jain Chatlani.

Sign up for our weekly newsletter to get more English-language news coverage from EL PAÍS USA Edition

Tu suscripción se está usando en otro dispositivo

¿Quieres añadir otro usuario a tu suscripción?

Si continúas leyendo en este dispositivo, no se podrá leer en el otro.

¿Por qué estás viendo esto?

Flecha

Tu suscripción se está usando en otro dispositivo y solo puedes acceder a EL PAÍS desde un dispositivo a la vez.

Si quieres compartir tu cuenta, cambia tu suscripción a la modalidad Premium, así podrás añadir otro usuario. Cada uno accederá con su propia cuenta de email, lo que os permitirá personalizar vuestra experiencia en EL PAÍS.

En el caso de no saber quién está usando tu cuenta, te recomendamos cambiar tu contraseña aquí.

Si decides continuar compartiendo tu cuenta, este mensaje se mostrará en tu dispositivo y en el de la otra persona que está usando tu cuenta de forma indefinida, afectando a tu experiencia de lectura. Puedes consultar aquí los términos y condiciones de la suscripción digital.

More information

Recomendaciones EL PAÍS
Recomendaciones EL PAÍS
_
_