IBM’s Darío Gil: ‘This year we’ll have the first Spanish-language AI model with use cases’

The champion of artificial intelligence for Spanish speakers says it will use open-source data from government and other collaborators

Apr 08, 2024 - 20:58CEST

Dario Gil is Senior Vice President and Director of IBM Research, one of the world's largest corporate research labs.

The human brain is incredibly complex, with 86 billion neurons firing 1,000 impulses per second. The task at hand is to emulate this biological marvel to understand and respond in Spanish, a language spoken around the world by over 500 million people. The Barcelona Supercomputing Center (BSC) and IBM are teaming up to tackle this challenge using the MareNostrum supercomputer and IBM’s global artificial intelligence (AI) expertise.

Dr. Darío Gil from Spain has played a key role in supporting this project as the leader of IBM’s global research division. In a recent interview after finalizing the agreement to develop the first major Spanish-language AI model, he promised this technological advancement will be “revolutionary.” The project will use an open and collaborative framework for AI-model training that uses data from the Spanish legislature’s meeting minutes, the Royal Spanish Academy, Spanish-speaking universities and groups, and other public sources. He expects to showcase a first iteration of the model by the end of the year.

Question. Why is IBM investing in a Spanish-language AI model?

Answer. Our approach to AI aligns with the Spanish government’s strategy. Unlike other companies, we believe in an open ecosystem shaping the future of AI. Similar to the shift in operating systems two decades ago, we foresee AI development rooted in open-source collaboration. The world wants to be a part of shaping the future of AI.

Q. What will that look like ?

A. We aim to build collaborative environments for creating foundational models using open, public sources with transparent data for pre-training. Increasing model capacity gradually is crucial. Current models need six to nine months of pre-training and yearly releases. By establishing a base model for all developers to enhance, we can collaboratively create the best foundational, open models in Spanish and other languages.

All the talk about bubbles comes from doomsday prophets, but AI’s technological base is powerful and solid

Q. How are you going to develop it?

A. From a computing standpoint, we’ll utilize the current capabilities in the BSC. The minister [Spain’s Digital Transition Minister José Luis Escrivá] has committed to investing in the MareNostrum 5 supercomputer and accelerating the development of version 6. IBM Research will provide supercomputing centers dedicated to artificial intelligence, as well as our latest research and innovations.

Q. Which sectors does the model prioritize?

A. Every sector matters, but the Spanish government aims to ensure that small and medium-sized businesses, which face more obstacles when adopting this technology, reap the benefits along with government entities. We have identified specific use cases to promote the application of artificial intelligence in these sectors.

The model will use public data, but we also want to use government documents – like legislature meeting minutes – which poses a unique and interesting challenge

Q. How much will it cost?

A. I don’t have a concrete number for you. But to give you a general idea, a high-performance foundational model needs thousands of processing units that each cost at least $35,000. These are ambitious projects, not something done casually. At IBM Research, we have committed 3,600 scientists and engineers along with dedicated supercomputing teams. And since it’s an open-source system, we’ll invite Spanish-speaking developers to join parallel efforts.

Q. Where will the data come from to feed the model?

A. The model will use publicly available data, but we also want to use government documents — like legislature meeting minutes — which poses a unique and interesting challenge. Spain’s national libraries and the Royal Spanish Academy are expected to collaborate — all public data.

For example, you can prompt the model for answers consistent with an Argentine experience – it adapts by learning from the context of the interaction

Q. And how will you preserve the diversity of the Spanish language?

A. From a mathematical perspective, language variations can be integrated within the neural network. The diversity lies within the network, and once fine-tuning is completed, it adjusts to different Spanish variations despite being trained on base documents. For example, you can prompt the model for answers consistent with an Argentine experience – it adapts by learning from the context of the interaction

Q. But certain words and phrases in Spain don’t have the same meaning in Latin America.

A. The base model will grow, specialize and improve its skills through collective efforts so specific contexts can be understood. The open-source model accommodates this vast diversity. The goal is to expand it extensively, including Brazil. Ibero-America offers vast market opportunities and we must leverage the competitive advantages of the Spanish language.

The goal is to expand the model extensively, including Brazil. Ibero-America offers vast market opportunities and we must leverage the competitive advantages of the Spanish language

Q. When will the first model be available?

A. The goal is to have an initial model and some use cases by year end. This [AI] world is very dynamic and no one is interested in projects that take years to produce results.

Q. What will IBM gain from this project?

A. Let me give you the example of Red Hat [an IBM-owned provider of enterprise open source solutions]. With billions in annual revenue, it’s the world’s largest open-source software company. This is the business model we want to emulate – not because we’re altruistic – we believe in this model. We sought strategic partners and found a strong alignment with the Spanish government. Over 80 institutions have committed to the project so far.

Q. Is there an AI bubble right now?

A. Technology is evolving faster than ever before. We now plan in days and weeks, not months and years. All the talk about bubbles comes from doomsday prophets, but AI’s technological base is powerful and solid.

Sign up for our weekly newsletter to get more English-language news coverage from EL PAÍS USA Edition

More information

Telcos gear up for the generative AI juggernaut

Pablo G. Bejerano

Europe’s world-first AI rules get final approval from lawmakers. Here’s what happens next

AP | London

More information

Telcos gear up for the generative AI juggernaut

Europe’s world-first AI rules get final approval from lawmakers. Here’s what happens next

Archived In