Home

Awesome

Kaggle NFL solution: 1st-stage 45th, 2nd-stage 42nd place

Kaggle NFL Big Data Bowl

<p align="center"> <img src="readme_images/Kaggle_NFL_CNN.PNG" width="495" height="291"> </p>

Summary:

2d-CNN (Convolutional Neural Network) for sparse heatmap images and MLP for tabular data

Input tensor for CNN

Generated heatmap-like field images of 30 ((YardLine - 10) <= X < (YardLine + 20)) x 54 (0 <= Y < 54) yards grid (rounded to integer).

After several experiments, the following combinations of 18 (= 3 x 3 x 2) channels worked best.

3 player categories:

3 variables:

2 frames:

Computed another snapshot of 1 second later by adding the speed. (Also tried adding acceleration, but did not improve the performance.)

CNN architecture

CNN part of the architecture was configured in YAML for PyTorch as follows.

(Please see PipelineX for the syntax)

  =: torch.nn.Sequential
  _:
    - {=: pipelinex.TensorSlice, end: 18}
    - =: pipelinex.ModuleConcat
      _:
        - {=: pipelinex.TensorConv2d, in_channels: 18, out_channels: 10, kernel_size: [3, 3]}
        - {=: pipelinex.TensorConv2d, in_channels: 18, out_channels: 10, kernel_size: [7, 7]}
        - {=: pipelinex.TensorConv2d, in_channels: 18, out_channels: 10, kernel_size: [3, 9]}
    - {=: torch.nn.CELU, alpha: 1.0}
    - =: pipelinex.ModuleConcat
      _:
        - {=: pipelinex.TensorAvgPool2d, stride: [1, 2], kernel_size: [3, 3]}
        - {=: pipelinex.TensorConv2d, stride: [1, 2], in_channels: 30, out_channels: 10, kernel_size: [3, 3]}
        - {=: pipelinex.TensorConv2d, stride: [1, 2], in_channels: 30, out_channels: 10, kernel_size: [7, 7]}
        - {=: pipelinex.TensorConv2d, stride: [1, 2], in_channels: 30, out_channels: 10, kernel_size: [3, 9]}
    - {=: torch.nn.CELU, alpha: 1.0}
    - =: pipelinex.ModuleConcat
      _:
        - {=: pipelinex.TensorAvgPool2d, stride: [1, 2], kernel_size: [3, 3]}
        - {=: pipelinex.TensorConv2d, stride: [1, 2], in_channels: 60, out_channels: 20, kernel_size: [3, 3]}
        - {=: pipelinex.TensorConv2d, stride: [1, 2], in_channels: 60, out_channels: 20, kernel_size: [7, 7]}
        - {=: pipelinex.TensorConv2d, stride: [1, 2], in_channels: 60, out_channels: 20, kernel_size: [3, 9]}
      # -> [N, 120, 30, 14]
    - {=: torch.nn.CELU, alpha: 1.0}
    - =: pipelinex.ModuleConcat
      _:
        - =: torch.nn.Sequential
          _:
            - {=: torch.nn.AvgPool2d, stride: [1, 2], kernel_size: [3, 14]}
            # -> [N, 120, 28, 1]
            - {=: pipelinex.TensorConv2d, in_channels: 120, out_channels: 20, kernel_size: [1, 1]}
            - {=: pipelinex.TensorFlatten, _: }
            - {=: torch.nn.CELU, _: }
        - =: torch.nn.Sequential
          _:
            - {=: torch.nn.MaxPool2d, stride: [1, 2], kernel_size: [3, 14]}
            # -> [N, 120, 28, 1]
            - {=: pipelinex.TensorConv2d, in_channels: 120, out_channels: 20, kernel_size: [1, 1]}
            - {=: pipelinex.TensorFlatten, _: }
            - {=: torch.nn.CELU, _: }

MLP (Multilayer Perceptrons)

Add another channel to encode tabular features.

Continuous features

Max, Min, Mean, Stdev for axis (X, Y) and player categories (Defense, Offense)

Categorical features (One-hot encoded)

Computing the CDF output

  1. Concatenate outputs of CNN and MLP.
  2. Add base probabilities computed by counting the Yards in the whole train dataset in 2018 to let the neural network learn the residual.
  3. Forced predicted probabilities for the yards beyond the goal line to 0
  4. Pad 0 to lower (-99 <= Yards < -10 yards) and upper (90 <= Yards < 100).
  5. Divide by the sum through Yards to make the sum to 1 (SoftMax without exponential)
  6. Compute cumulative sum through Yards

Loss function

CRPS with yards clipped to -10 to 29 yards

Other settings

What did not work:

Dependencies available in Kaggle Kernel

Dependencies not available in Kaggle Kernel

Dependencies only for experimentation (not used in Kaggle Kernel)

How to run

1. Install dependencies

$ pip install torch pytorch-ignite pandas numpy pipelinex kedro mlflow

2. Clone this repository and run kaggle_nfl_main.py

$ git clone https://github.com/Minyus/kaggle_nfl.git
$ cd kaggle_nfl/kaggle/
$ pytyon kaggle_nfl_main.py 

Tested environment