Home

Awesome

DOI

OpenForest

openforest_logo

OpenForest is an initiative to centralize open access forest monitoring datasets. This repository is open to contributions. It has been motivated by our work OpenForest: A data catalogue for machine learning in forest monitoring, ArXiv 2023.

Each one of the datasets listed in OpenForest follows these critera which are discussed in the corresponding article. If you want to add a new dataset, please ensure that it follows the same criteria before proceeding to the next stage.

The OpenForest catalogue is available in this location: here.

If you find this catalogue useful for your research, please cite our paper:

@misc{ouaknine2023openforest,
      title={OpenForest: A data catalogue for machine learning in forest monitoring}, 
      author={Arthur Ouaknine and Teja Kattenborn and Etienne Laliberté and David Rolnick},
      year={2023},
      eprint={2311.00277},
      archivePrefix={arXiv},
      primaryClass={cs.CV}
      }

Preliminary steps

  1. Clone the repo:
$ git clone https://github.com/RolnickLab/OpenForest.git
  1. Install the repo using pip:
$ cd OpenForest/
$ pip install -e .

The pip installation will include all the dependencies of the requirements file. if not, you should install these dependencies manually using pip or conda. With this, you can edit the OpenForest code on the fly and import function and classes of OpenForest in other project as well.


How to add a new dataset?

  1. Create a new dataset file from the proposed template PULL_REQUEST_TEMPLATE.yml.
$ cd OpenForest/
$ cp openforest/PULL_REQUEST_TEMPLATE.yml openforest/NEW_DATASET.yml
  1. Edit the dataset file as you want while respecting the following typos for each attribute:
  1. Run tests locally to ensure the format of the file:
$ cd openforest/
$ bash tests/run_tests_to_add.sh NEW_DATASET.yml
  1. Create your branch:
$ git checkout -b my-branch
  1. Update OpenForest with your new dataset and delete your YAML file after verifying that the update has been correctly done:
$ python scripts/add_new_dataset.py --dataset_file='NEW_DATASET.yml'
$ rm NEW_DATASET.yml
  1. Commit your changes, push it to your branch and create a Pull Request:
$ cd ..
$ git add .
$ git commit -m "meaningful commit message"
$ git push origin my-branch

Create a new Pull Request on the Github webpage of the repo. It will be validated and merged to the main branch as soon as possible if it fits the requirements and passes the tests.


How to modify and explore OpenForest

After any step of this section, you can push your changes and create a pull request according to step 4.

  1. Modify a dataset

If you want to modify the content of a dataset row in OpenFoest, you need a dedicated YAML file template including both the existing and updated information. Note that the dataset_name attribute should match with an existing one in OpenForest. First run the following test with the corresponding YAML file MODIFIED_DATASET.yml:

$ cd openforest/
$ bash tests/run_tests_to_modify.sh 'MODIFIED_DATASET.yml'

If you pass the tests, run the following command line to remove the dataset, update OpenForest and delete your YAML file:

$ python scripts/modify_dataset.py --dataset_file='MODIFIED_DATASET.yml'
$ rm MODIFIED_DATASET.yml
  1. Remove a dataset

If you want to delete a dataset in OpenForest, you only need the name of your dataset. First, run the following test:

$ cd openforest/
$ bash tests/run_tests_to_remove.sh 'Name of your dataset'

If you pass the tests, run the following command line to remove the dataset and update OpenForest:

$ python scripts/remove_dataset.py --dataset_name='Name of your dataset'
  1. Print the dataset row and URL(s) to access it:

You can print the dataset information, including the URL(s) to access it using the following command line:

$ cd openforest/
$ python scripts/print_dataset.py --dataset_name='Name of your dataset'
  1. Commit your changes, push it to your branch and create a Pull Request:
$ cd ..
$ git add .
$ git commit -m "meaningful commit message"
$ git push origin my-branch

Create a new Pull Request on the Github webpage of the repo. It will be validated and merged to the main branch as soon as possible if it fits the requirements and passes the tests.


How to add, modify or delete a data provider?

This section is under construction


To-do list