Home

Awesome

Exploring spatio-temporal soccer events using public event data

<img width=200, src="https://raw.githubusercontent.com/scikit-mobility/tutorials/master/AMLD%202020/sobigdata_logo.jpg" />

Tutorial supported by EU project <a href="https://cordis.europa.eu/project/id/871042">SoBigData++</a> RI (Grant Agreement 871042).

A video version of the tutorial is available on YouTube at:

The code has been developed by:

and explores the events in an open collection of soccer-logs described in the following paper (please cite it if you use the public data of the code in this folder):

<a id='datapaper'></a>

Data collection

The soccer-logs have been collected and provided by <a href="https://wyscout.com/">Wyscout</a>. The procedure of data collection is performed by expert video analysts (the operators), who are trained and focused on data collection for soccer, through a proprietary software (the tagger). The tagger has been developed and improved over several years and it is constantly updated to always guarantee better and better performance at the highest standards.

Based on the tagger and the videos of soccer games, to guarantee the accuracy of data collection, the tagging of events in a match is performed by three operators, one operator per team and one operator acting as responsible supervisor of the output of the whole match. Optionally for near-live data delivery a team of four operators is used, one of them acting to speed up the collection of complex events which need additional and specific attributes or a quick review. <a id='datapaper'></a> Further details on data collection can be found in the data paper (PCR2019).

Data Records

The data sets are released under the CC BY 4.0 License and are publicly available on figshare:

The data refer to season 2017/2018 of five national soccer competitions in Europe: Spanish first division, Italian first division, English first division, German first division, French first division. In addition, there are data about the World cup 2018 and the European cup 2016, which are competitions for national teams. In total, we provide seven data sets corresponding to information about all competitions, matches, teams, players, events, referees and coaches.

Each data set is provided in JSON format (JavaScript Object Notation). The following table shows the list of competitions we make available with their total number of matches, events and players. The data covers a total of around 1,941 matches, 3,251,294 events and 4,299 players.

Competition#matches#events#players
Spanish first division380628,659619
English first division380643,150603
Italian first division380647,372686
German first division306519,407537
French first division380632,807629
World cup 201864101,759736
European cup 20165178,140552
1,9413,251,2944,299

Outline of the tutorial