Home

Awesome

Raster Vision AWS Batch runner setup (for RV < 0.12)

This repository contains the deployment code that sets up the necessary AWS resources to utilize the AWS Batch runner in Raster Vision.

⚠️ For RV >= 0.12, the contents of this repo have moved into the main repo.

Note: The master branch of this repo should be used in conjunction with the master branch (or latest Docker image tag) of Raster Vision which contains the latest changes. For versions of this repo that correspond to stable, released versions of Raster Vision, see:

Using Batch is advantageous because it starts and stops instances automatically and runs jobs sequentially or in parallel according to the dependencies between them. In addition, this deployment sets up distinct CPU and GPU resources and utilizes spot instances, which is more cost-effective than always using a GPU on-demand instance. Deployment is driven via the AWS console using a CloudFormation template. This AWS Batch setup is an "advanced" option that assumes some familiarity with Docker, AWS IAM, named profiles, availability zones, EC2, ECR, CloudFormation, and Batch.

Table of Contents

AWS Account Setup

In order to setup Batch using this repo, you will need to setup your AWS account so that:

AWS Credentials

Using the AWS CLI, create an AWS profile for the target AWS environment. An example, naming the profile raster-vision:

$ aws --profile raster-vision configure
AWS Access Key ID [****************F2DQ]:
AWS Secret Access Key [****************TLJ/]:
Default region name [us-east-1]: us-east-1
Default output format [None]:

You will be prompted to enter your AWS credentials, along with a default region. The Access Key ID and Secret Access Key can be retrieved from the IAM console. These credentials will be used to authenticate calls to the AWS API when using Packer and the AWS CLI.

Deploying Batch resources

To deploy AWS Batch resources using AWS CloudFormation, start by logging into your AWS console. Then, follow the steps below:

Optional: Publish local Raster Vision images to ECR

If you setup ECR repositories during the CloudFormation setup (the "advanced user" option), then you will need to follow this step, which publishes local Raster Vision images to those ECR repositories. Every time you make a change to your local Raster Vision images and want to use those on Batch, you will need to run this step.

Run ./docker/build in the main Raster Vision repo to build local copies of the Tensorflow CPU, Tensorflow GPU, and PyTorch images.

In settings.mk, fill out the options shown in the table below.

VariableDescription
RASTER_VISION_TF_CPU_IMAGEThe local Raster Vision TF CPU image to use.
RASTER_VISION_TF_GPU_IMAGEThe local Raster Vision TF GPU image to use.
RASTER_VISION_PYTORCH_GPU_IMAGEThe local Raster Vision PyTorch image to use.
ECR_TF_CPU_IMAGEThe name of the ECR TF CPU image
ECR_TF_GPU_IMAGEThe name of the ECR TF GPU image
ECR_PYTORCH_IMAGEThe name of the ECR PYTORCH image
ECR_IMAGE_TAGThe ECR image tag to use, that is the tag in ECR_TF_CPU_IMAGE,ECR_TF_GPU_IMAGE, and ECR_PYTORCH_IMAGE

Run make publish-images to publish the images to your ECR repositories.

Update Raster Vision configuration

Finally, make sure to update your Raster Vision configuration with the Batch resources that were created.

Deploy new job definitions

When a user starts working on a new RV-based project (or a new user starts working on an existing RV-based project), they will often want to publish a custom Docker image to ECR and use it when running on Batch. To facilitate this, there is a separate CloudFormation template for creating new job definitions. The idea is that for each user/project pair which is identified by a Namespace string, a CPU and GPU job definition is created which point to a specified ECR repo using that Namespace as the tag. After creating these new resources, the image should be published to repo:namespace on ECR, and the new job definitions should be placed in a project-specific RV profile file.