Home

Awesome

Vegetation Health

Predicting vegetation health from precipitation and temperature

Introduction

This repository experiments with different machine learning models to predict drought indices in East Africa (specifically the Normalized Difference Vegetation Index) using temperature and precipitation data.

Results

Models are trained on data before 2016, and evaluated on 2016 data. Vegetation health in June is being predicted.

In addition, vegetation health can be hidden from the model to better understand the effects of the other features.

ModelRMSERMSE (no veg)
Linear Regression0.0400.084
Feedforward neural network0.0380.070
Recurrent neural network0.0350.060

The results of the models can also be compared visually with the ground truths (the example below is from the baseline logistic regression):

<img src="figs/ndvi_results_logistic_regression.png" alt="Logstic regression results" height="400px"/>

In addition, the effects of the inputs on the models' predictions are investigated using shap values in Jupyter Notebooks, for both the feedforward neural network and the recurrent neural network.

Pipeline

Python Fire is used to generate a CLI.

Data cleaning

Normalize values from the original csv file, remove null values, add a year series.

python run.py clean

A target can be selected by adding the flag --target, e.g. --target=ndvi_anomaly. By default, the target is ndvi. The selected target must be in predictor.preprocessing.VALUE_COLS.

The original data is currently generated using datasets on the Oxford University cluster, using the scripts in data.

Data Processing

Turn the CSV into numpy arrays which can be input into the model.

python run.py engineer

Models

3 models have been implemented: a baseline linear regression, a feedforward neural network and a recurrent neural network. They can be selected using the --model_type flag.

python run.py train_model

Setup

Anaconda running python 3.7 is used as the package manager. To get set up with an environment, install Anaconda from the link above, and (from this directory) run

conda env create -f environment.yml

This will create an environment named vegetation_health with all the necessary packages to run the code. To activate this environment, run

conda activate vegetation_health

Additional Notes