A course, a book and a new project
So many news… Our new course “Data Science for Football Professionals” have been eventually announced with a starting date in November 2023. Also my new book (still only in Italian) “Calcio e Intelligenza Artificiale” is in press by Carocci, the largest Italian University Press, and will be available in bookshops from January 2024. I am also proud to support a group of bright Edinburgh students whose project on Women’s Super League data has been accepted to the StatsBomb conference coming up this October.
The course
So let’s start with the course first. After having designed and delivered training programmes for the Scottish Football Association and various independent training bodies internationally and recently been commissioned the evaluation of the football data analyst accreditation provided by one of the leading football data platforms, I decided I wanted to try and build my own training programme. I partnered with the Data Driven Innovation (DDI) supported Data Upskilling Short Courses initiative at the Bayes Institute and after twelve month of development here we are with our own version of training in football analytics . This course responds to what in my experience is a gap in football data analysis education internationally regarding the provision of basic training for domain experts with no background in statistics or coding. As well as not requiring any pre-existing data literacy, the course aims at reflecting the reality of lower league clubs and youth academies, where data is used differently from top clubs. It is most appropriate then than to join the course team is Akhil Khan, the lead Data Analyst del Sunderland AFC, who will give a down-to-earth view of how data is used in football.
“The Italian scene on football analytics is very interesting and quite different from the math-strong UK tradition. “
The book
The book idea developed in conjunction with the course really. Indeed my main intention – hence the decision to write it in Italian first – is to propel a conversation with training providers on what the pre-requisites and the learning outcomes should look like in a course on football data analysis. The Italian scene on football analytics is very interesting and quite different from the math-strong UK tradition. Heavily influenced by the most established school in tactical video analysis, it reflects a unique mix of football expertise and statistical knowledge. The book is indeed also a reflection from a sociology of science perspective on how video data and numerical data embricate in football analysis. The first couple of chapters are a long sociological rant about how the reason for the limited success of the ‘Moneyball’ approach to football is not the innumeracy of the real football man but the intrinsic rigidity of the statistical approach in coming to terms with the multiple and relative measures of success of a football club. I then present the results of an interview-based study including more than 40 interviews with coaches, analysts and directors at the top of Italian football and in the UK, to show their different views on what analytics could bring to football. For contrast, the book also reports results of an ethnographic study of “public analysts”, young football fans with a math degree that use Twitter to post their data-heavy tactical analyses. The book concludes with my perspective on football analytics education.
“With women’s football, there will always be scarcity of data. “
A new project
Last but not least is my recent collaboration with a group of Edinburgh PhD students who contacted me in the summer to initiate a data analytics study on women’s football. After securing Statsbomb data on the last five years of Women’s Super League (WSL) the team started to rethink the current football say “roles are no longer positions, but functions” in the context of WSL data. The starting idea is this Soccerment paper that tries to assess players by overcoming the concept of “position” and look for “hybrid” clusters i.e. clusters of players that do a bit of everything, carrying their “function” wherever they are on the pitch. Even Guardiola says his style of play is “leftist”: every player does everything on the pitch, which is a fresher version of the Dutch total football of Cruyff’s times. With women’s data, method is as exciting as content: with women’s football data, there will always be scarcity of data. Three WSL seasons produce less data than 1 Premier League season. The Soccerment paper says of men’s football data: “we do not include goalkeepers because the number of players is not sufficient to perform a full clustering analysis with the same methodology used for outfield players”. If 20 players (1 player per each of the 20 teams in the league) x 5 leagues – the Soccerment study included data from the top five European leagues – x 4 seasons is not enough for their method, that means none of the WSL player roles will be enough. How can we adjust to work with data scarcity in women’s football without missing out on the power of analytics? That means inclusion threshold should be set differently from data on male football. Stats need to be chosen with more purpose: more stats per player to produce more data or choosing less stats than distinguish better between players? Normalisation should also be thought differently i.e. normalisation by playing time: if three WSL seasons produce less data than 1 PL season, normalisation should be done by values that are 1/3 lower than what other analyses do with men’s data. Also, in men’s football they normalise because dominant teams (having more of the ball) have higher attacking stats and lower defensive stats. Dominance is so pronounced in male football because of the budget disparities between few wealthy teams at the top and the rest. As well as normalising by 1/3 of the values, one should ask if dominance in WSL is as big a problem as it is in male football…
Comments are closed
Comments to this thread have been closed by the post author or by an administrator.