The challenges of having data

“Data” is entering just about any discussion these days – including football (known in very few countries as ‘soccer’).

Most football clubs have seen Moneyball, and are well aware that you cannot solely base your decisions on what you see and the result of a game. Goals are simply too infrequent events to be the foundation for match analysis as, at the professional level, the most common scoreline is 1-0, followed by 2-1 and 2-0.

In order to get better data, companies such as Opta and Understat provide more detailed analysis. One of the most interesting is Expected Goals – or xG – and it has proven to add much more depth in the analysis of a match and the underlying performance of teams and individual players.

Duncan Alexander explaining Opta’s version of XG. Other companies may have slightly different definitions, but they pretty much do the same thing.

What if the data goes against the common belief system?

If we take the three biggest names, xG will show that:

  • Messi scored 31,5 more goals than expected in the last 5 seasons (176 goals, with only 2015/16 at 1 below expected)
  • Christiano Ronaldo has scored 4,4 more than expected (155 goals, with just one season above expected – 8,5 goals in 2014/15)
  • Neymar has scored as expected (93 goals, varying from -4 to +3 per season).

In other words: Messi is really good at finishing, the two others are on par with what you would expect. However, consider these data points:

  • In 2018/19 Robert Lewandowski scored 11 goals fewer than expected, yet still ended up as top scorer in the Bundesliga and winning the title with Bayern
  • Harry Kane consistently scores at the expected rate, except in 16/17 where he outperformed the expected by close to 50% or 9 goals!
  • During the last 5 seasons, Paul Pogba is 10% worse finisher than expected, but as a chance creator, he has 45% more assists than expected over 5 seasons

So if you have that kind of data, what decisions do you make that are different? Should Lewandowski get a bonus because he became top scorer, or a salary decrease because he wasted the work of his colleagues? Should you build a team around Harry Kane or Paul Pogba?

If you go through the thousands of data point, it shows that there are actually very few players that are exceptional finishers. Even players that are consistently above the expected, such as Son Heung-Min (44%), Eden Hazard (34%), Mo Salah (17%), Gareth Bale (17%), and Sadio Mané (15%) typically only contribute 2-4 goals more per season than expected. The likes of Sergio Aguero and Aubemayang, are within +/- 5% of the expected every season. Top coaches know this and focus on increasing the likelihood of converting a chance – and less on the number of shots taken. Pundits and the rest of us…not so much.

More data (initially) means more complexity

xG provides more insight than purely the number of goals scored, but every coach will also say that they need more. The most advanced clubs will have GPS tracking of players, heart rate monitors, video analysis, injury records, leadership qualities assessments, amount of merchandise you can sell, etc.

Furthermore, once aggregated to a team level and used on a per match basis, it becomes even more complicated. slightly more complicated when we aggregate it to team level, which is at the end of the day what matters in football (and to the fans).

Once, you start working with the data, patterns will emerge and you will start to see, which data points you need to pay attention to at what specific points in time. Maybe fitness is more important during a busy period, maybe passing completion rate during the warm months etc.

More data only increases complexity, but as patterns start to emerge simplicity emerges, and creates high value.

You have the data but what is the right question to ask?

Football is decided by very few goals, which exaggerates a problem known as outcome bias. Evaluating the quality of something when the outcome is known or distorting the narrative.

A good example is a recent game between Liverpool and Leicester. Liverpool ended up winning due to an injury-time penalty. Most pundits and reports tell a story of a brave Leicester that deserved a point from the game. Liverpool was lucky and had the “margins on its side”.

If you look at the game as a whole Liverpool’s XG was 3.75 – and Leicester’s 0,10. In other words, Leicester was lucky to be in it at all, and Liverpool could consider themselves unlucky. However, if you look at it from the “10 minutes to go, all square”-angle, the answer is slightly different. Liverpool creates 0,84 xG after the equalizer, with the majority being attributed to the late penalty (0,76). The nature of a penalty is that it can be created from a chance, which is a much smaller chance from the start – so yes, after 1-1, Liverpool could be considered lucky.

At a “season”-level, Manchester United is in “crisis” after a weak start ranking number 10 after 11 rounds with their lowest points total for more than 30 years. However, if we look at the individual xG of the individual games they have played, they have outperformed the opposing team – and been “unlucky” in 8 out of 11 matches. Do you fire the coach based on that?

Knowing what the right question to ask is critical. Football results are the same as business results in the sense that they are a result of many small decisions being made by players, or your employees, along with external factors such as weather and referees (VAR…), or competition and legislation – and a lot of the time simply luck. In many cases you can be close and win/lose a sale by very little, however, the customer does not need 52% of their car and 48% of your car. The same is the case in football, where you can be totally outplayed and still walk away with three points.

You can’t improve without data

Having more data, will make things more complicated. However, you do need to be able to ask the right questions. Data scientists can crunch it for you, ask you some “stupid” questions, but they are really only effective when combined with the existing knowledge in the organisation. The same way that football coaches are still football coaches, supported by data. Not the other way around.

So why not ask: Why do you get so excited at corners, when only 1 out of every 70 results in a goal – or one goal per 4.5-7 matches?

Or is 2-0 really the most dangerous lead? No it obviously isn’t

About the Author

Jesper E. Thomsen

Jesper is the owner of e-thomsen, and you can read more about his activities under "about".

Leave a Reply

Your email address will not be published. Required fields are marked *