Who is going to win the Copa América? This is what our statistical model predicts
EL PAÍS predicts the outcome of the tournament, in which Argentina and Brazil start as favorites, although in one out of every three simulations another team will win. Predictions are updated every day
What are the chances each nation has of winning the Copa América? We look at this question by simulating the outcome of the tournament thousands of times. This way we can calculate how likely it is that each team will to reach the round of 16, or win the tournament.
TEAM |
---|
EP INDEX | QUARTERFINALS | SEMIFINALS | FINAL | WINNER |
---|
The table above shows which teams are the favorites and by how much. The data also demonstrates the enormous uncertainty of the playing field. Soccer is an unpredictable sport, and our model takes that into account: that’s why, although Argentina is the favorite, 66% of the time the Albiceleste will not lift the trophy.
This is the fourth time EL PAÍS has published predictions for a tournament, after the 2018 and 2022 World Cups and the ongoing European Championship. The full methodology can be read at the bottom of the page, although the essence of the model is simple: we feed it with data on the results of each national team and the quality of its players, and then simulate the tournament thousands of times.
By clicking on the button below you can simulate the Copa América from the quarterfinals using our model to see who wins. If you click again, you will perform a second simulation and in all likelihood another team will win. Your winners accumulate in the table.
If you spend some time playing with this, you will see that the winning percentages align with the table at the top of the page. It will do so slowly, because that’s how chance works.
How the favorites have changed
Each day we will update the predictions and save the previous ones. In the selector in the first table in the page, you can see what our model said each day. In addition, in the chart below, we collate the evolution of our predictions throughout the tournament:
Group stage predictions
The top two teams from each group advance to the knock-out stage. In this table we show the probability each team has of finishing in each position in the group, always according to our model:
TEAM | IEP | 1st | 2nd | 3rd | 4th |
---|
TEAM | IEP | 1st | 2nd | 3rd | 4th |
---|
TEAM | IEP | 1st | 2nd | 3rd | 4th |
---|
TEAM | IEP | 1st | 2nd | 3rd | 4th |
---|
Predictions in the final stage
The table shows the odds of each team passing its Round of 16 matchup. From the quarterfinals onwards, we show the two teams with the best chances of reaching that stage, and two percentages: each team’s odds of reaching the stage and beating its opponent.
What other predictions say
We are not the only ones making predictions! Every year there are more and more companies, communities, and fans who are encouraged to crunch their own numbers. In this table we collate the statistical model of sports statistics company Opta, the average prediction of a selection of betting companies, which are usually the most accurate, and the predictions made by our own model.
TEAM |
---|
EL PAÍS | BOOKIES | OPTA |
---|
How does our model work?
Our predictions are the result of thousands of simulations, which are only partly random. In each match, the probability of one team or the other winning depends on their statistics. For example, if Argentina plays against Bolivia, the probability of the former winning is around 83%.
The model consists of three parts:
1. How strong is each team? To decide this, we use two metrics: their recent results (measured with an ELO ranking, a method originally from chess) and the quality of their players (measured by their value in euros, with data from the Transfermarkt website).
2. Who wins each match? We have trained a model with thousands of matches to — given two teams and their strength metrics — estimate how likely each outcome is. The model gives the probability of victory, draw and defeat, and even the score. For example, in that hypothetical duel between Argentina and Bolivia, the most likely outcomes are 2-0 and 3-0 to Argentina, each with a probability of about 14%.
3. And to predict the entire Copa América? What we do is to simulate it match by match, potential opponent by potential opponent. We repeat it thousands of times to reach 200,000 possible tournament outcomes, and thus be able to estimate the probability of each event.
Frequently asked questions
So you say Argentina will win? No, no. Our model says that Argentina is the most likely winner, but also that it has only a one-in-three chance of winning. It is important to interpret this: in reality, an Argentina victory is no more likely than seeing a goalkeeper save a penalty.
This data goes to show that the winner of the Copa América is difficult to predict. And that's no surprise. Firstly, it is a tournament designed for luck to play a part: it is not a regular league, it has no playoffs, no home-and-away matches. Secondly, national teams play few genuinely big games and their performance is more uncertain than that of a club. And thirdly, football is full of surprises. Hardly anyone would sit down to watch a match if the result was a foregone conclusion.
Have you done this before? Yes. We used a similar model for the 2018 and 2022 World Cups. The model has proven to be well calibrated: outcomes to which we assigned a probability of between 0% and 15% occurred 4% of the time, for example, and those with 85% or 100% probability occurred 94% of the time. In 2018, we did much better than fate, better than the FIFA ranking and better than two big banks (UBS and Goldman Sachs). In 2022, the final was contested between two of the three teams we saw as the best, and we even beat the bookies.
Should I bet using your forecasts? No. Our model is relatively sophisticated and can work well. But the bookies have shown in the past that they are very difficult to get the better of. Moreover, in order not to lose money, it is not enough to beat them; you also have to compensate for the margin that bookmakers reserve for themselves when setting prices.
Statistical models are useful as a reference, for transparency, and because they allow us to calculate details that betting companies do not answer. But, when it comes to getting it right, bookmakers use a hybrid approach: they combine their own models with the fine-tuning performed by their experts, to take into account the extra information they have (such as details of playing styles, form, or injuries).
More technical details
What is an ELO ranking? It is a metric that captures the strength of each team according to their results. Each team has a certain number of points — its ELO points — and with each match an exchange takes place. The winner takes points from the loser. If the victory is a surprise (because the weaker team wins) the teams exchange more points. ELO rankings work well and are increasingly used in sports and in video games, for example, to organize matches between players of similar level. Our model uses the ELO ranking of the Eloratings website.
Why do you use squad values? Because national teams play relatively few competitive matches and that makes it difficult to measure their performance before a big event. One way to add information to the model is to use the transfer market value of the players in the squad. We take the data from the popular Transfermarkt website, adjusted for the age of the players (those over 30 are cheaper because they have fewer years of career ahead of them).
How does the match simulator work? We have adjusted a simple model that estimates the number of goals each team will score based on whether they are better or worse than their opponents, in ELO points, and squad value. The model also takes into account whether the matches have a host or are played on neutral ground, but that hardly influences the Copa América. We use a Poisson-type distribution, which adjusts reasonably for goals and has been used in statistical models and academic studies.
In the end, for our model, the probability of a team winning a match depends on numbers: the differences with the opponent in ELO points and in squad value. In the Copa América, all matches are considered neutral, except those played by the host, the United States. The Americans have a double benefit: we are giving them the advantage a team has when playing a home game — in a league or World Cup qualifier, for example — but we also give them an extra boost equivalent to 0.2 goals per game. We chose that number based on an analysis by FiveThirtyEight, which estimated that hosts in the World Cup tournament historically score 0.4 goals more per game than expected.
Why do we use a model that predicts goals and not wins directly? It has two advantages: it helps us to predict the group stage and extra time (as 30-minute matches). Some studies (since Dixon & Coles, 1997) say that models that predict goals underestimate the draws that occur. We have found this to be the case in the Champions League and in the major European leagues, but the effect is smaller in World Cups, and we have decided not to make this adjustment for the Copa América.
How accurate can we expect the model to be? To calibrate it, we used a database of 18,000 national team matches played since 2004, including hundreds from World Cups, Copa América championships and European Championships.
With training data, the model predicts the outcome of 60% of home matches and 56% of neutral ground matches. Evaluated in probabilistic terms, the model's predictions score between 0.176 and 0.186 (expressed with a Ranking Probability Score (as here, and here). A reasonable level of accuracy, if we compare it with other models or betting companies.
You can also read the evaluation we did of our 2018 model (Spanish) and in 2022 (also in Spanish), after each World Cup.
Sign up for our weekly newsletter to get more English-language news coverage from EL PAÍS USA Edition