Why the bizarre Arsenal v Man United game is a statistical outlier

If post-match emotions were a random variable that one wants to investigate, then conducting the random experiment on an Arsenal fan would produce the most unbiased sample. An Arsenal fan is exposed to brilliant 5-0 performances, steady 2-0 performances, drubbing 0-0 games and also humiliating 0-3 defeats.  Such an exhaustive spectrum of post-match emotions cannot exist within any other top six clubs (maybe Liverpool) and surely not the bottom clubs, because they are seldom exposed to 5-0 performances. So being a gunner is a special feeling, though that is not the point of this article.

I felt bizarre after the 1-3 defeat to Man United at the Emirates, a feeling that wasn’t enlisted above. It was a new emotion to succumb to a defeat after dominating a game so much. De Gea made 14 saves in this game, a joint highest in Premier League history. I did not feel unhappy after a defeat, I felt confused for the first time. My fellow gunners decided to disagree with me and rightly pointed out that we were piss poor at our finishing and that our shot conversion was horribly low. That left me thinking, on an average day would this Arsenal team be this poor at shot conversion ? The only way for me to check if this event at the Emirates were an outlier, was through data. So when my friend asked me how improbable this event could be, I set out to look at Arsenal’s one year’s Premier league data with regards to shot conversion and shots on target conversion, and I produce my results thusly.

Shot Conversion:

Shot Conversion is simple a ratio of number of goals scored to number of shots taken in a game. I need to admit that Arsenal have the poorest Shot Conversion among the top six. But also, they are the joint top with Man City for number of shots taken (270) this season, so that kind of increases the number of goals they could score, admittedly with a lot of inefficiency.

Arsenal v Man United Game:

The total number of shots taken by Arsenal against Man United was 33 and Arsenal managed to score 1 goal by the end of it. So the shot conversion was a shocking 3%. Now the question that remains to be answered is, how probable it is for Arsenal (with this squad) to achieve this number. The below plots were generated using data from the recent 50 premier league games of Arsenal and histograms and subsequent analysis were made on 10 bins of the random variable (in this case shot conversion).

ars_manu_shotConversion1

ars_manu_shotConversion2

X Y Probability cumulative probability
0 0.03 0 0 0
1 0.053 6.038647 0.138888889 0.138888889
2 0.076 2.415459 0.055555556 0.194444444
3 0.099 4.830918 0.111111111 0.305555556
4 0.122 6.038647 0.138888889 0.444444444
5 0.145 6.038647 0.138888889 0.583333333
6 0.168 4.830918 0.111111111 0.694444444
7 0.191 4.830918 0.111111111 0.805555556
8 0.214 2.415459 0.055555556 0.861111111
9 0.237 2.415459 0.055555556 0.916666667
10 0.26 3.623188 0.083333333 1

From the above probability density plot, cdf plot and the table, we can get an intuition of how probable it is for this Arsenal squad to produce a performance with 3% shot conversion. It is evident from the histogram that it is quiet likely for Arsenal to produce performances with shot conversions of around 3-5%, although the mean is 13.6% with a standard deviation 6.3%. In fact this past year’s data suggests that the probability of shot conversion to lie between 3% and 5.3% is around 0.1388 or 13.88%, which is quiet high ! So it is tempting to conclude that this bizarre game, was infact not very bizarre. But we did not anywhere include in our calculations of probability that there were 33 shots taken in this game ! In other words a conditional probability is required to give a clearer picture (P(shotConversion<=3%|number of shots>=33). Although that could be done, it was a little beyond the scope of my discussion, so I turned to another metric, that could probably clear things out better. I looked at the shots on target conversion for the past one year.

Shots on Target Conversion:

Shots on Target Conversion is a simple ratio of Goals scored to number of Shots on Target in a game. Arsenal had 16 shots on target versus Man United and scored 1 goal , which meant a shots on target conversion of 6%. Now let’s look at how probable it is for this Arsenal team to achieve this number.

ars_manu3

ars_manu4

X Y Probability cumulative probability
0 0.0625 0 0 0
1 0.15625 0.888889 0.083333333 0.083333333
2 0.25 1.185185 0.111111111 0.194444444
3 0.34375 2.666667 0.25 0.444444444
4 0.4375 1.481481 0.138888889 0.583333333
5 0.53125 2.37037 0.222222222 0.805555556
6 0.625 0 0 0.805555556
7 0.71875 0.888889 0.083333333 0.888888889
8 0.8125 0.296296 0.027777778 0.916666667
9 0.90625 0 0 0.916666667
10 1 0.888889 0.083333333 1

The mean shots on target conversion is 43.6% and standard deviation is 23.6%, both much higher in comparison to the shot conversion. Also there are a few gaps in the histogram which simply means in this past year, there hasn’t been games with 60% and 90% shots on target conversion. But it is evident from the histo, that the probability of getting a shot on target conversion of less than 20% is very low. Infact from the table we can conclude that the probability of shots on target conversion to be between 6.25% and 15.625% is around 0.0833 or 8.33% which is low. Infact the probability to achieve a Shot on target conversion of 6% (or less than 6.25%) is close to zero. This reinstates our belief that this Arsenal v ManUnited game, with 16 shots on target for Arsenal, is indeed an outlier.

Conclusion :

If this game’s data is excluded, Arsenal had an average shot conversion of 15%, which meant 33 shots in a game should result in an expected goals of 0.15 x 33 = 4.95 or 5 goals if the average is taken as an estimate of future result. Infact Arsene quoted this stat to defend his team’s result and that they were truly unlucky. Although it is clear from our analysis that it is not so improbable for Arsenal to score 1 goal with 33 shots in a premier league game, but it is very improbable for Arsenal to score just 1 goal with 16 shots on target ! The bizarreness that existed in my life has been mitigated to a certain extent, hope I’ve helped a few others as well. #COYG.

2 thoughts on “Why the bizarre Arsenal v Man United game is a statistical outlier

Add yours

  1. Hi, Fantastic job! I am fascinated by your work.

    It elucidates the point that Arsenal failed to capitalise in front of goal, something I imagine many of us may have guessed during the first half. In my case, I was shocked that the score was unbelievably two zero to Man United although it was United that were taking all the heat!

    I have one suggestion and one request:

    1. Suggestion: The data is neat, and easy to comprehend. How about going a step further to figure out the root cause?

    My claim is that there are a few players who are more likely to convert in front of goal, we can intuitively guess this for teams, eg. Ronaldo for Real Madrid etc.

    What if Wenger saw this data, and made selections accordingly? Or is he doing that already?

    2. Request: I understand your data, because I was lucky to have a good stats prof in college. However I am a novice in data analysis, and I don’t know how to work on real-life sets.

    Can you please let me know the source of the data, the software you used and your method to obtain these values?

    I have no intention of any monetization activities, just a hobby to analyze interesting data sets.

    Thanks again for such an interesting take on this heart-breaking game, it made my evening!

    Liked by 1 person

    1. Thank you Hargun for your kind words.

      Your take on analysing the root cause is interesting. Stats of Players involved( for example Alexis’ shot conversion with left foot and de gea’s 1v1 shot stopping) in key moments would indeed affect the outcome, and the analysis can be taken forward in this direction. I will try to make more in-depth analysis in my subsequent articles.

      If Wenger saw this I’d be elated, but i think every top club already has sophisticated quant research teams. I understand your point of having more efficient players in key situations(better shot conversion etc.) but thats the constraint isnt it ? Arsenal is a well run self sufficient club which works within a modest budget. To optimize performance with possible set of players ( a function of budget and scouting abilities) and hence adapt to a certain style of play that helps the team achieve pre determined targets is the challenge posed to the manager. Wenger never gets praised enough, in my opinion, for the job he does at Arsenal for this reason.

      Data collection was painfully manual. We are trying to automate data collection process and currently working on it as a project. I used python to analyse the data, drop me a note on my email and i’d be happy to share my code.

      Cheers!
      Vijay (sendvikymails.1@gmail.com)

      Like

Leave a comment

Blog at WordPress.com.

Up ↑