Indo-European dialects dispersed across Eurasia in successive waves over the course of 8,000 years

Word origins and ancient DNA reveal the evolutionary path traveled by the languages spoken by half the world

Lenguas indoeuropeas
A Hittite tablet found in Hattusa, the ancient empire’s capital, now the Turkish town of Boğazkale.nutcat (Getty Images/iStockphoto)

“Noche,” the Spanish word for “night” comes from the Latin “noctis.” It’s “nyktós” in Greek and “nuit” in French. In Sanskrit, the classical language of India, night is “naktasya.” Around 250 years ago, Gaston-Laurent Coeurdoux, a French Jesuit priest, was one of the first to suggest a common origin for certain languages. Since then, linguists have been on an Indiana Jones-like quest to discover the origins of these Indo-European languages. Now, a collaborative scientific effort using linguistic analysis, advanced computing, archaeology, and ancient DNA reconstructed a common ancestor of the Indo-European language family called the Proto-Indo-European language.

According to Glottolog’s database, there are around 400 Indo-European languages spoken today (although the distinction between regional variety, dialect and language may be somewhat arbitrary). Almost half of the world’s population speaks one of these languages. Their original expansion occurred over thousands of years, reaching from present-day Ireland in the west to China in the east, and from Scandinavia in the north to India in the south.

For decades, experts in this field have been divided into two major camps. Some argue that the ancestral Proto-Indo-European language was spoken approximately 9,000 years ago in the northern Fertile Crescent, which encompasses present-day Turkey, Lebanon and Iraq. This region holds great significance as it bore witness to the birth of agriculture. As agricultural practices expanded, the language of early farmers spread far and wide. An alternative hypothesis suggests that around 6,000 to 4,500 years ago, steppe populations migrated in both western and eastern directions. One fascinating example is the mysterious Yamnaya people. They brought their languages to Europe, giving rise to the Italic, Germanic and Celtic branches of the Indo-European family tree.

Indoeuropean languages
This map shows the expansion and diversification of Indo-European languages, but experts disagree on the routes taken by those who spoke the precursor languages of Persian and Sanskrit.

Led by researcher Paul Heggarty, a team of over 80 scientists, including linguists and geneticists, challenged the prevailing Anatolian sedentary farmer theory and the theory of the nomadic herders from the steppes. They contend that while both are flawed, they also contain elements of truth. “We analyzed linguistic data as if it were genetic data,” said Heggarty. To accomplish this, a database of 5,013 cognates was meticulously crafted. These cognates represent words that share a common origin, such as the various iterations of the word “night.” Spanning across 161 Indo-European languages, including 52 ancient or extinct ones like Tocharian, Gothic and Old Spanish, this exhaustive compilation has paved the way for constructing the phylogenetic tree of Indo-European languages. To determine the divergence of each branch (ten branches currently exist), the team has also dated languages where historical dating was previously unknown. “For example, we set the date for Classical Latin at 50 BC,” said Heggarty, who led the project while he was a professor at the Max Planck Institute for Evolutionary Anthropology (Germany). They then worked backwards to find the point of origin. “The approach aims to unify all branches to ascertain the age of the common ancestor of all languages,” said Heggarty, who is now a professor at the Pontifical Catholic University of Peru.

Approximately 7,000 years ago, the Indo-European linguistic lineage had already split into numerous distinct branches, according to the study published in Science. “This would rule out the steppe hypothesis,” said Heggarty. Around 8,120 years ago, the Proto-Indo-European language likely experienced its initial diversification event, give or take a few centuries. Recent studies of ancient DNA suggest that farmers from the Caucasus region — between the Black Sea and Caspian Sea — migrated towards Anatolia, which supports the Anatolian theory. Hittite, an extinct language spoken by the Anatolian civilization, is another significant branch of the Indo-European family. For decades, a large group of linguists argued that Hittite was the common ancestor of the other Indo-European languages, with some even considering it to be the direct heir of Proto-Indo-European.

Ancient DNA, on the other hand, has provided compelling evidence in support of the steppe hypothesis. Since 2015, it has become clear that individuals originating from the Pontic steppe, situated to the south and northeast of present-day Russia, Ukraine, and Kazakhstan, migrated to Central Europe approximately 6,000 to 4,500 years ago. Their genetic legacy is evident in both modern Europeans and the indigenous populations of that era. Notably, studies conducted in 2018 and 2019 revealed how these migrant eastern populations replaced a significant proportion of males on the Iberian Peninsula. Furthermore, they brought with them Italic, Germanic, and Celtic languages. It is important to note that when they departed from their original homeland, they likely spoke a common or closely related language descended from Proto-Indo-European. However, as their very slow journey progressed (the Celts took centuries to reach present-day Ireland) and they settled in new territories, language diversification began to emerge.

“The Albanians, Greek-speaking Mycenaeans and Hittites do not have a dominant genetic signal from the steppe.”
Paul Heggarty, researcher at the Max Planck Institute for Evolutionary Anthropology in Germany.

Heggarty’s team made a significant contribution by shedding light on this question. By combining phylogenetic analysis of cognates with insights from ancient DNA, they found potentially two distinct origins. Expansion initially originated from the southern Caucasus region, resulting in the separation of five major language families approximately 7,000 years ago. “The Albanians, Greek-speaking Mycenaeans and Hittites do not have a dominant genetic signal from the steppe,” said Heggarty. Several millennia later, another wave emerged, led by nomadic steppe herders from the north. This wave not only influenced the development of western branches of the language tree, but it also possibly played a role in the evolution of Slavic and Baltic languages. It even extended its influence to the Indian subcontinent, while giving rise to the now-extinct Tocharian languages in what is present-day Tibet.

Latin and the origins of romance languages

Diversification didn’t stop there. Even when Latin was confined to Latium, one of the regions in present-day Italy, over 400 languages were spoken on the Italian peninsula. These were mostly Italic languages, which belonged to the Indo-European branch. “The Roman legionnaires spread Latin across the entire continent,” said Kim Schulte, a professor at the Jaume I University (Castelló de la Plana, Spain) and an expert in the diversification of Romance languages. As the imperial language expanded, much like Spanish and English in the Americas, it unfortunately led to the eradication of local languages. During this process, the seed of diversification was planted.

According to the study, the belief that Romance languages have medieval origins stems from the earliest writings in Spanish, Catalan, and French of that period. However, it is important to note that these languages were already spoken centuries earlier. The diversification of Vulgar Latin (popular or colloquial Latin) into distinct Romance languages had already commenced early in this era. Despite the political cohesion of the Roman Empire, various factors contributed to the fragmentation of language. “Several things contributed to language differences,” said Schulte. “One factor is the influence of local languages, like ancient Iberian or Tartessian in Spain. Another factor is political control. For example, Romanian developed differently because the Romans controlled Dacia (present-day Romania) for a relatively short period. Geographic distance also plays a role.” For example, ‘mensa’ means ‘table’ in classical Latin, which is ‘mesa’ in Spanish and ‘masa’ in Romanian. However, other areas closer to Rome adopted ‘tabula,’ a linguistic innovation from Vulgar Latin. Thus, a table is called ‘tabula’ in Italian, ‘taula’ in Catalan, and ‘table’ in French. Schulte says, “Dialects have existed in Spain and other regions since the early days of the Roman Empire.”

Sign up for our weekly newsletter to get more English-language news coverage from EL PAÍS USA Edition

More information

Archived In

Recomendaciones EL PAÍS
Recomendaciones EL PAÍS