Google launches Gemini, an AI model capable of outperforming humans in multitasking language comprehension

The multinational company presents versions of the new platform for use in all areas of its surroundings, from Bard to telephones

A visitor looks at an artificial intelligence exhibit at the Center for Contemporary Culture in Barcelona last October.Gianluca Battista

Raúl Limón

Dec 07, 2023 - 16:00CET

Share on Whatsapp

Share on Facebook

Share on Twitter

Prefer EL PAÍS on Google

The race for artificial intelligence (AI) has become a speed test. In response to the advances in ChatGPT, now in its fourth version, and large multinational corporations’ consecutive announcements of their own systems, on Wednesday Google launched Gemini, a multimodal artificial intelligence platform that can process and generate text, code, images, audio and video from different data sources. The Ultra version will be “available early next year,” according to Eli Collins, the vice president of products at Google DeepMind, who also announced that Gemini outperforms humans in massive multitasking language understanding (MMLU), an assessment benchmark created from 57 science, technology, engineering and mathematics (STEM), humanities and social science subjects.

“Gemini is our largest and most capable AI model,” says Collins. He explains that it is “inspired by the way people understand and interact with the world. It’s perceived more as a useful collaborator and less as a clever piece of programming.”

During the presentation, Gemini was able to identify a geometric shape, analyze the formulation to find its area and discover an error in it to propose and explain a successful result to the problem. Thus, it can produce results based on an image, alphanumeric text and voice data. It also identified different shapes and drawings, some based only on scattered points, can propose uses for the figures or objects presented, elaborate a story based on alternative proposals, and develop updated graphics with information that the platform itself searches for.

According to DeepMind’s vice president, it has scored over 90% on MMLU, the multitasking language understanding evaluation system. “It is the first AI model to outperform human experts on this industry-standard benchmark,” he says. Gemini also passed (59.4%) the test for “comprehension of multimodal tasks that include demands requiring deliberate reasoning.”

Gemini is not an application but a platform for bringing the latest artificial intelligence model to existing services, from Bard, Google’s chat competitor to ChatGPT, to the search engine, service managers, Android cell phones and large-scale data centers.

Three “sizes” of Gemini will be available: the Nano, which can already be used by Android developers; the Pro, which will be available starting on December 13; and the Ultra, which will be offered at the beginning of next year (the date has yet to be determined). Developers and business customers will be able to access Pro through the Gemini API in Google AI Studio and Vertex AI. Android developers will also be able to build apps with Nano through AICore.

Bard

Sissie Hsiao, head of Google Assistant and Bard, announced that Gemini is already incorporated into the latest chat in English for 180 countries and will gradually be expanded to other languages, although she admitted that the company will have to confirm that its development is compatible with imminent European artificial intelligence regulations, which include these dialog platforms among its regulatable developments. It will be expanded to all compatible applications with its inclusion in Bard.

The process will take place in two phases: the first will use a Pro version, which will provide the chat with “more advanced reasoning, planning, understanding and other capabilities,” Hsiao explains. The second stage involves improvements that will culminate in the use of the Ultra version early next year.

Gemini was born multimodal, meaning that it was not trained with different data modalities to then unify differentiated capabilities, but rather its programming starts from a diversity of sources. As Collins explains, “this helps Gemini to seamlessly understand all kinds of inputs much better than existing models and its capabilities are state-of-the-art.”

It is also capable of programming, including complex developments. In this regard, Amin Vahdat, vice president at Google Cloud, asserts that “in the future, we will see programmers making use of high-capability AI models as collaborative tools that help with the entire software development process, from reasoning about problems to assisting with implementation, performance and capabilities.”

In regard to security, Google says that Gemini passes “the most comprehensive evaluations of all models to date.” The company reports that it has subjected the platform to all existing and potential risks and maintains an ongoing review process that includes “stress testing.” It has also applied the company’s own AI principles, which set out the ethical standards for its developments.

Despite these advances, its managers note that Gemini is not infallible. They admit that it will produce errors and hallucinations (confident-looking answers not justified by data). “We have made a lot of progress and Gemini is our best model in that sense, but I would say it is still an unsolved research problem,” Collins concedes.

Sign up for our weekly newsletter to get more English-language news coverage from EL PAÍS USA Edition

Bard

Archived In