Home

Awesome

understatr <img src="man/figures/logo.png" alt="" width="120" align="right" />

lifecycle R-CMD-check

Overview

An R package to help with retrieving tidy understat data.

Install

understatr is not likely to be submitted to CRAN. Get the latest development version from GitHub:

remotes::install_github('ewenme/understatr')

Use

library(understatr)

Check currently available leagues/seasons:

get_leagues_meta()
#> # A tibble: 48 × 4
#>    league_name  year season    url                                        
#>    <chr>       <dbl> <chr>     <chr>                                      
#>  1 EPL          2021 2021/2022 https://understat.com/league/EPL/2021      
#>  2 EPL          2020 2020/2021 https://understat.com/league/EPL/2020      
#>  3 EPL          2019 2019/2020 https://understat.com/league/EPL/2019      
#>  4 EPL          2018 2018/2019 https://understat.com/league/EPL/2018      
#>  5 EPL          2017 2017/2018 https://understat.com/league/EPL/2017      
#>  6 EPL          2016 2016/2017 https://understat.com/league/EPL/2016      
#>  7 EPL          2015 2015/2016 https://understat.com/league/EPL/2015      
#>  8 EPL          2014 2014/2015 https://understat.com/league/EPL/2014      
#>  9 La liga      2021 2021/2022 https://understat.com/league/La%20liga/2021
#> 10 La liga      2020 2020/2021 https://understat.com/league/La%20liga/2020
#> # … with 38 more rows

Get stats for a team’s playing squad in a league season:

get_team_players_stats(team_name = "Manchester City", year = 2018)
#> 
#> ── Column specification ────────────────────────────────────────────────────────
#> cols(
#>   player_id = col_double(),
#>   player_name = col_character(),
#>   games = col_double(),
#>   time = col_double(),
#>   goals = col_double(),
#>   xG = col_double(),
#>   assists = col_double(),
#>   xA = col_double(),
#>   shots = col_double(),
#>   key_passes = col_double(),
#>   yellow_cards = col_double(),
#>   red_cards = col_double(),
#>   position = col_character(),
#>   team_name = col_character(),
#>   npg = col_double(),
#>   npxG = col_double(),
#>   xGChain = col_double(),
#>   xGBuildup = col_double()
#> )
#> # A tibble: 21 × 19
#>    player_id player_name     games  time goals    xG assists     xA shots key_passes
#>        <dbl> <chr>           <dbl> <dbl> <dbl> <dbl>   <dbl>  <dbl> <dbl>      <dbl>
#>  1       619 Sergio Agüero      33  2515    21 19.9        8  5.23    118         34
#>  2       618 Raheem Sterling    34  2788    17 15.9       10 10.8      77         66
#>  3       337 Leroy Sané         31  1866    10  6.98      10  8.10     56         40
#>  4       750 Riyad Mahrez       27  1333     7  6.62       4  5.01     54         24
#>  5      3635 Bernardo Silva     36  2851     7  8.20       7  8.63     62         71
#>  6      5543 Gabriel Jesus      29   993     7 12.6        3  2.65     43         21
#>  7       314 Ilkay Gündogan     31  2133     6  4.21       3  4.97     43         43
#>  8       617 David Silva        33  2426     6  8.13       8 10.1      51         73
#>  9      2498 Aymeric Laporte    35  3059     3  3.75       3  0.839    26         13
#> 10       447 Kevin De Bruyne    19   965     2  1.47       2  6.65     31         36
#> # … with 11 more rows, and 9 more variables: yellow_cards <dbl>,
#> #   red_cards <dbl>, position <chr>, team_name <chr>, npg <dbl>, npxG <dbl>,
#> #   xGChain <dbl>, xGBuildup <dbl>, year <dbl>

Get stats for a player across all seasons:

get_player_seasons_stats(player_id = 618)
#> 
#> ── Column specification ────────────────────────────────────────────────────────
#> cols(
#>   position = col_character(),
#>   games = col_double(),
#>   goals = col_double(),
#>   shots = col_double(),
#>   time = col_double(),
#>   xG = col_double(),
#>   assists = col_double(),
#>   xA = col_double(),
#>   key_passes = col_double(),
#>   year = col_double(),
#>   team_name = col_character(),
#>   yellow = col_double(),
#>   red = col_double(),
#>   npg = col_double(),
#>   npxG = col_double(),
#>   xGChain = col_double(),
#>   xGBuildup = col_double(),
#>   player_name = col_character()
#> )
#> # A tibble: 8 × 19
#>   position games goals shots  time    xG assists     xA key_passes  year
#>   <chr>    <dbl> <dbl> <dbl> <dbl> <dbl>   <dbl>  <dbl>      <dbl> <dbl>
#> 1 FWL          3     1     7   128  1.71       0  0.111          1  2021
#> 2 AML         31    10    70  2539 12.1        7  6.63          39  2020
#> 3 FWL         33    20   100  2678 19.8        1  7.21          48  2019
#> 4 AML         34    17    77  2788 15.9       10 10.8           66  2018
#> 5 Sub         33    18    87  2594 18.8       11  8.84          55  2017
#> 6 AMR         33     7    64  2532  8.11       6  5.50          46  2016
#> 7 AML         31     6    52  1943  7.15       2  3.25          35  2015
#> 8 AML         35     7    84  3059  8.79       7  6.04          75  2014
#> # … with 9 more variables: team_name <chr>, yellow <dbl>, red <dbl>, npg <dbl>,
#> #   npxG <dbl>, xGChain <dbl>, xGBuildup <dbl>, player_id <dbl>,
#> #   player_name <chr>

Issues

If you encounter a clear bug, please file a minimal reproducible example on GitHub. For questions and other discussion, try stackoverflow or e-mail.

Disclaimer

While there is no official notice on the site condoning web scraping activity, Understat’s support have previously confirmed (via e-mail exchange, 8th November 2018) that their data is free to use for non-commercial purposes. This stance is subject to change.

Also, be polite and attribute the source.


Please note that this project is released with a Contributor Code of Conduct. By participating in this project you agree to abide by its terms.