“Data” is entering just about any discussion these days – including football (known in very few countries as ‘soccer’).
Most football clubs have seen Moneyball, and are well aware that you cannot solely base your decisions on what you see and the result of a game. Goals are simply too infrequent events to be the foundation for match analysis as, at the professional level, the most common scoreline is 1-0, followed by 2-1 and 2-0.
In order to get better data, companies such as Opta and Understat provide more detailed analysis. One of the most interesting is Expected Goals – or xG – and it has proven to add much more depth in the analysis of a match and the underlying performance of teams and individual players.
What if the data goes against the common belief system?
If we take the three biggest names, xG will show that:
- Messi scored 31,5 more goals than expected in the last 5 seasons (176 goals, with only 2015/16 at 1 below expected)
- Christiano Ronaldo has scored 4,4 more than expected (155 goals, with just one season above expected – 8,5 goals in 2014/15)
- Neymar has scored as expected (93 goals, varying from -4 to +3 per season).
In other words: Messi is really good at finishing, the two others are on par with what you would expect. However, consider these data points:
- In 2018/19 Robert Lewandowski scored 11 goals fewer than expected, yet still ended up as top scorer in the Bundesliga and winning the title with Bayern
- Harry Kane consistently scores at the expected rate, except in 16/17 where he outperformed the expected by close to 50% or 9 goals!
- During the last 5 seasons, Paul Pogba is 10% worse finisher than expected, but as a chance creator, he has 45% more assists than expected over 5 seasons
So if you have that kind of data, what decisions do you make that are different? Should Lewandowski get a bonus because he became
If you go through the thousands of data
More data (initially) means more complexity
xG provides more insight than purely the number of goals scored, but every coach will also say that they need more. The most advanced clubs will have GPS tracking of players, heart rate monitors, video analysis, injury records, leadership qualities assessments, amount of merchandise you can sell, etc.
Furthermore, o
Once, you start working with the data, patterns will emerge and you will start to see, which data points you need to pay attention to at what specific points in time. Maybe fitness is more important during a busy period, maybe passing completion rate during the warm months etc.
You have the data but what is the right question to ask?
Football is decided by very few goals, which exaggerates a problem known as outcome bias. Evaluating the quality of something when the outcome is known or distorting the narrative.
A good example is a recent game between Liverpool and Leicester. Liverpool ended up winning due to an injury-time penalty. Most pundits and reports tell a story of a brave Leicester that deserved a point from the game. Liverpool was lucky and had the “margins on its side”.
If you look at the game as a whole Liverpool’s XG was 3.75 – and Leicester’s 0,10. In other words, Leicester was lucky to be in it at all, and Liverpool could consider themselves unlucky. However, if you look at it from the “10 minutes to go, all square”-angle, the answer is slightly different. Liverpool creates 0,84 xG after the equalizer, with the majority being attributed to the late penalty (0,76). The nature of a penalty is that it can be created from a chance, which is a much smaller chance from the start – so yes, after 1-1, Liverpool could be considered lucky.
At a “season”-level, Manchester United is in “crisis” after a weak start ranking number 10 after 11 rounds with their lowest points total for more than 30 years. However, if we look at the individual xG of the individual games they have played, they have outperformed the opposing team – and been “unlucky” in 8 out of 11 matches. Do you fire the coach based on that?
Knowing what the right question to ask is critical. Football results are the same as business results in the sense that they are a result of many small decisions being made by players, or your employees, along with external factors such as weather and referees (VAR…), or competition and legislation – and a lot of the time simply luck. In many cases you can be close and win/lose a sale by very little, however, the customer does not need 52% of their car and 48% of your car. The same is the case in football, where you can be totally outplayed and still walk away with three points.
You can’t improve without data
Having more data, will make things more complicated. However, you do need to be able to ask the right questions. Data scientists can crunch it for you, ask you some “stupid” questions, but they are really only effective when combined with the existing knowledge in the organisation. The same way that football coaches are still football coaches, supported by data. Not the other way around.
So why not ask: Why do you get so excited at corners, when only 1 out of every 70 results in a goal – or one goal per 4.5-7 matches?
Or is 2-0 really the most dangerous lead? No it obviously isn’t…