Skip to main content


The FIFA World Cup has finally arrived and for the next month the question we’ll all be asking is who is going to take home the glory? In a research paper published this week, Researchers Andreas Groll and colleagues at the Technical University of Dortmund in Germany combined machine learning with statistical analysis to identify Spain (at 17.8 percent) as the most likely winner.  The combined odds of the bookmakers predict Brazil as the favourites to win the 2018 World Cup, at a probability of 16.6 percent, with Germany and Spain at 12.8 percent and 12.5 percent respectively. New machine learning techniques could however outperform conventional statistical methods and put a spanner in the works as punters look to outsmart bookies using artificial intelligence.  

“We are blown away by the exceptional advances in machine learning and as a business that operates using the latest innovative technology we can definitely see merit in the AI predictions. We could see punters backing the bots and betting against the bookies on this one.” commented Dyan Liebenberg, Marketing Manager of Sportingbet SA, South Africa’s largest sportsbook.

The Random-Forest approach

Researchers did 100 000 simulations of the FIFA World Cup possible outcomes using three different approaches based on performances from 2002-2014.  The first technique called the random-forest approach combines machine learning with statistical methods in a theory based on the idea that some future event can be determined by a decision tree in which an outcome is calculated at each branch by reference to a set of training data. Decision trees are usually prone to overfitting at later stages of the process which is a condition where unreliable decisions are distorted due to inconsistent and scattered training data. The random-forest approach avoids this by calculating the overall outcome as an average of the randomly selected branches instead of every branch while also revealing the factors most important in predicting the outcome.

The researcher’s model included relevant factors such as FIFA’s ranking of national teams, players average age, the number of Champions League players they have, whether they have home advantage, and so on. They also included less directly relevant factors such as countries’ populations rates, GDP and coach nationality.  They then included other ranking attempts from bookmakers. The more parameters in a particular decision tree, the easier to see which ones have the biggest impact on the outcome and which to ignore in the future.

Best predictors

In order to “improve the predictive power sustainability” the researchers then extrapolated the best-performing prediction methods and combined them to reveal Spain as slight favourites with a 17.8% probability of success and a 73% chance of reaching the quarter-finals. But wait, It’s not that simple. Germany has a 58% chance of reaching the quarter-finals due to stronger opposition than Spain would face. If, however, both make the quarters, they would then have an equal chance of winning. “Spain is slightly favored over Germany mainly due to the fact that Germany has a comparatively high chance to drop out in the round-of-sixteen. Running the random-forest approach for the most probable tournament course, shows Germany instead of the Spanish, but because of the huge number of permutations of games, this course is still extremely unlikely.” says Groll and co.

“The researchers seem so confident in their methodology with machine learning that they must literally be betting on it!” added Liebenberg. “Sportingbet’s new ‘Cup of Comrades’ free prediction game is a perfect opportunity to put these predictions to the test!” says Liebenberg. “Entry is free and users stand a chance to win R10 000 daily and a share of a whopping R100 000 at the end of the tournament.” end Liebenberg.

Bots vs humans. Game on! Visit for free entry.