Home

Awesome

gcpbatchtracker

DRMAA2 JobTracker implementation for Google Batch

Experimental Google Batch support for DRMAA2os.

How gcpbatchtracker Works

The project is created for embedding it as a backend in https://github.com/dgruber/drmaa2os

What gcpbatchtracker is

It is a basic DRMAA2 implementation for Google Batch for Go. The DRMAA2 JobTemplate can be used for submitting Google Batch jobs. The DRMAA2 JobInfo struct is used for getting the status of a job. The job state model is converted to the DRMAA2 spec.

How to use it

See examples directory which uses the interface directly.

Converting a DRMAA2 Job Template to an Google Batch Job

DRMAA2 JobTemplateGoogle Batch Job
RemoteCommandCommand to execute in container or script or script path
ArgsIn case of container the arguments of the command (if RemoteCommand empty then the arguments of entrypoint)
CandidateMachines[0]Machine type or when prefixed with "template:" it uses an instance template with that name
JobCategoryContainer image or $script$ or $scriptpath$ for other runnables which interpretes then RemoteCommand as script or script path
JobNameJobID
AccountingIDSets a tag "accounting"
MinSlotsSpecifies the parallelism (how many tasks to run in parallel)
MaxSlotsSpecifies the amount of tasks to run. For MPI set MinSlots = MaxSlots.
MinPhysMemoryMB of memory to request; should be set to increase from default to full machine size
ResourceLimitskey could be "cpumilli", "bootdiskmib", "runtime" -> runtime limit like "30m" for 30 minutes

Override resource limits "cpumilli" to get full amount of resources one running just one task per machine (like 8000 for 8 cores)!

For StageInFiles and StageOutFiles see below.

In case of a container following files are always mounted from host:

    "/etc/cloudbatch-taskgroup-hosts:/etc/cloudbatch-taskgroup-hosts",
    "/etc/ssh:/etc/ssh",
    "/root/.ssh:/root/.ssh",

For a container the following runtime options are set:

Default output path is cloud logging. If "OutputPath" is set it is changed to LogsPolicy_PATH with the OutputPath as destination.

JobTemplate Extensions

DRMAA2 JobTemplate Extension KeyDRMAA2 JobTemplate Extension Value
ExtensionProlog / "prolog"String which contains prolog script executed on machine level before the job starts
ExtensionEpilog / "epilog"String which contains epilog script executed on machine level after the job ends successfully
ExtensionSpot / "spot""true"/"t"/... when machine should be spot
ExctensionAccelerators / "accelerators""Amount*Accelerator name" for machine (like "1*nvidia-tesla-v100")
ExtensionTasksPerNode / "tasks_per_node"Amount of tasks per node
ExtensionDockerOptions / "docker_options"Override of docker run options in case a container image is used
ExtensionGoogleSecretEnv / "secret_env"Used for populating env variables from Google Secret Manager. Please use SetSecretEnvironmentVariables()

JobInfo Fields

DRMAA2 JobInfoBatch Job
SlotsParallelism

Job Control Mapping

Did not yet find some way to put a job in hold, suspend, or release a job. Terminating a job deletes it...

Job State Mapping

DRMAA2 StateBatch Job State
DoneJobStatus_SUCCEEDED
FailedJobStatus_FAILED
Suspended-
RunningJobStatus_RUNNING JobStatus_DELETION_IN_PROGRESS
QueuedJobStatus_QUEUED JobStatus_SCHEDULED
UndeterminedJobStatus_STATE_UNSPECIFIED

File staging using the Job Template

NFS (Google Filestore) and GCS is supported.

For NFS in containers besides directories also files can be specified. In case of files, the directory is mounted to the host and from there the file inside the container as specified in key. For the directory case a leading "/" is required.

    StageInFiles: map[string]string{
            "/etc/script.sh": "nfs:10.20.30.40:/filestore/user/dir/script.sh",
            "/mnt/dir": "nfs:10.20.30.40:/filestore/user/dir/",
            "/somedir": "gs://benchmarkfiles", // mount a bucket into container or host
        },

StageOutFiles creates a bucket if it does not exist before the job is submitted. If that failes then the job submission call fails. Currently only gs:// is evaluated in the StageOutFiles map.

    StageOutFiles: map[string]string{
            "/tmp/joboutput": "gs://outputbucket",
        },

Examples

See examples directory.