Home

Awesome

ncaahoopR <img src="figures/logo.png" align="right" />

ncaahoopR is an R package for working with NCAA Basketball Play-by-Play Data. It scrapes play-by-play data and returns it to the user in a tidy format, allowing the user to explore the data with assist networks, shot charts, and in-game win-probability charts.

For pre-scraped schedules, rosters, box scores, and play-by-play data, check out the ncaahoopR_data repository.

To see the lastest changes in version 1.5, view the change log here.

Installation

You can install ncaahoopR from GitHub with:

# install.packages("devtools")
devtools::install_github("lbenz730/ncaahoopR")

If you encounter installation issues, the following tips have helped a few users successfully install the package:

Functions

Several functions use ESPN game_ids. You can find the game_id in the URL for the game summary, as shown in the URL for the summary of the UMBC-Virginia game below. game_id

Scraping Data

The team parameter in the above functions must be a valid team name from the ids dataset built into the package. See the Datasets section below for more details.

Win-Probability and Game-Flow Charts

Win Probability Charts

The latest function for plotting win probability charts is wp_chart_new. Following the 2021-22 season other win probability chart functions will be deprecated and replaced by this function (it will be renamed to wp_chart but I don't want to break any existing pipelines during the season). It no longer requires users to input colors. For best results consider saving via ggsave(filename, height = 9/1.2, width = 16/1.2) (or some other 16/9 aspect ratio.)

wp_chart_new(game_id, home_col = NULL, away_col = NULL, include_spread = T, show_legend = T)

A prior version of wp_chart used base R while gg_wp_chart used the ggplot2 plotting library. As of the 2020-21 season, both functions call the same ggplot2 library, and gg_wp_chart now simply aliases wp_chart

wp_chart(game_id, home_col, away_col, include_spread = T, show_legend = T)

gg_wp_chart(game_id, home_col, away_col, show_labels = T)

Game Flow Charts

game_flow(game_id, home_col, away_col)

Game Excitement Index

game_exciment_index(game_id, include_spread = T)

Returns GEI (Game Excitement Index) for given ESPN game_id. For more information about how these win-probability charts are fit and how Game Excitement Index is calculated, check out the below links

Game Control Measures

average_win_prob(game_id, include_spread = T)

average_score_diff(game_id)

Assist Networks

Traditional Assist Networks

assist_net(team, season, node_col, three_weights = T, threshold = T, message = NA, return_stats = T)

Circle Assist Networks and Player Highlighting

circle_assist_net(team, season, highlight_player = NA, highlight_color = NA, three_weights = T, threshold = 0, message = NA, return_stats = T)

Shot Charts

There are currently three functions for scraping and plotting shot location data. These functions are written by Meyappan Subbaiah.

get_shot_locs(game_id): Returns data frame with shot location data when available. Note that if the extra_parse flag in get_pbp_game is set to TRUE, shot location data will already be included in the play-by-play data (if available).

game_shot_chart(game_id, heatmap = F): Plots shots for a given game.

team_shot_chart(game_ids, team, heatmap = F): Plots shots taken by team during a given set of game(s).

opp_shot_chart(game_ids, team, heatmap = F): Plots shots against a team during a given set of game(s).

Datasets

dict A data frame for converting between team names from various sites.

ids A data frame for converting between team names from various sites.

ncaa_colors A data frame of team color hex codes, pulled from teamcolorcodes.com. Additional data coverage provided by Luke Morris.

Available Colors Primary and secondary colors for all 353 teams.

These datasets can be loaded by typing data("ids"), data("ncaa_colors"), or data("dict"), respectively.

Examples

Win Probability Charts

wp3 wp_chart_new(401403405)

wp wp_chart(game_id = 401082978, home_col = "gray", away_col = "orange")

wp2 wp_chart(game_id = 401168364, home_col = "#7BAFD4", away_col = "#001A57")

Game Flow Chart

game_flow game_flow(game_id = 401082669, home_col = "blue", away_col = "navy")

Single-Game Assist Network

Assist Single assist_net(team = "Oklahoma", node_col = "firebrick4", season = 400989185)

Season-Long Assist Network

Assist All assist_net(team = "Yale", node_col = "royalblue4", season = "2017-18")

Circle Assist Networks

UNC circle_assist_net(team = "UNC", season = 401082861)

Player Highlighting

Frankie Ferrari circle_assist_net(team = "San Francisco", season = "2018-19", highlight_player = "Frankie Ferrari", highlight_color = "#FDBB30")

Shot Charts

heatmap game_shot_chart(game_id = 401168364, heatmap = T)

shotchart game_shot_chart(game_id = 401168364)

Glossary

Play-by-Play files contain the following variables:

If extra_parse = TRUE in get_pbp_game, the following variables are also included.


Stand-alone shot location data frames contain the following variables.

Raw Shot Location Data

The court is 94 feet long (baseline to baseline, interior) and 50 feet wide (sideline to sideline, interior). The court's origin is located at center court, with the court being displayed in a horizontal fashion (the baskets lie along the x axis). In this coordinate grid, -x corresponds to the left basket and +x to the right. +y corresponds to the upper sideline of the court, and -y to the lower.

Following ESPN's convention, the home team's shot locations are on the +x basket, and the visiting team's on the -y basket. The center of each basket is at (+/-41.75, 0).

The data pulled via get_shot_locs() follows this orientation.

Shot Chart Data

For the shot chart functions, the x and y coordinates are "flipped" such that the court is oriented vertically, and each team would appear to be shooting on the same basket. That is, the home team and away team are both shooting on a basket centered at (0, -41.75). This is done out of convenience and does not affect any underlying analyses