Awesome
Garm External Provider For AWS
The AWS external provider allows garm to create Linux and Windows runners on top of AWS virtual machines.
Build
Clone the repo:
git clone https://github.com/cloudbase/garm-provider-aws
Build the binary:
cd garm-provider-aws
go build .
Copy the binary on the same system where garm is running, and point to it in the config.
Configure
The config file for this external provider is a simple toml used to configure the AWS credentials it needs to spin up virtual machines.
region = "eu-central-1"
subnet_id = "sample_subnet_id"
[credentials]
# Allowed values are: static, role
# When using IAM roles, you can omit the [credentials.static] section
credential_type = "static"
[credentials.static]
access_key_id = "sample_access_key_id"
secret_access_key = "sample_secret_access_key"
session_token = "sample_session_token"
If you're running GARM on eks, you can use the IAM role assigned to the eks nodes by setting credential_type
to role
. In order for this to work, the environment variables prefixed with AWS_
need to be visible by the provider. By default, GARM does not pass through any environment variables to the external providers. It only sets the needed variables that controls the operations of the provider itself. To pass through variables, you will need to set the environment_variables
option in the provider configuration. For example:
[[provider]]
name = "ec2_external"
description = "external provider for AWS"
provider_type = "external"
disable_jit_config = false
[provider.external]
config_file = "/etc/garm/garm-provider-aws.toml"
provider_executable = "/opt/garm/providers/garm-provider-aws"
# This option will pass all environment variables that start with AWS_ to the provider.
# To pass in individual variables, you can add the entire name to the list.
environment_variables = ["AWS_"]
Creating a pool
After you add it to garm as an external provider, you need to create a pool that uses it. Assuming you named your external provider as aws
in the garm config, the following command should create a new pool:
garm-cli pool create \
--os-type windows \
--os-arch amd64 \
--enabled=true \
--flavor t2.small \
--image ami-0d5f36b04ca291a9f \
--min-idle-runners 0 \
--repo 5b4f2fb0-3485-45d6-a6b3-545bad933df3 \
--tags aws,windows \
--provider-name aws
This will create a new Windows runner pool for the repo with ID 5b4f2fb0-3485-45d6-a6b3-545bad933df3
on AWS, using the image with AMI ID ami-0d5f36b04ca291a9f
and instance type t2.small
. You can, of course, tweak the values in the above command to suit your needs.
Here an example for a Linux pool:
garm-cli pool create \
--os-type linux \
--os-arch amd64 \
--enabled=true \
--flavor t2.small \
--image ami-04c0bb88603bf2e3d \
--min-idle-runners 0 \
--repo 5b4f2fb0-3485-45d6-a6b3-545bad933df3 \
--tags aws,ubuntu \
--provider-name aws
Always find a recent image to use. For example to see available Windows server 2022 images, run something like aws ec2 describe-images --region eu-central-1 --owners self amazon --filters "Name=platform,Values=windows" "Name=name,Values=*Windows_Server-2022*"
.
Tweaking the provider
Garm supports sending opaque json encoded configs to the IaaS providers it hooks into. This allows the providers to implement some very provider specific functionality that doesn't necessarily translate well to other providers. Features that may exists on AWS, may not exist on Azure or OpenStack and vice versa.
To this end, this provider supports the following extra specs schema:
{
"$schema": "http://cloudbase.it/garm-provider-aws/schemas/extra_specs#",
"type": "object",
"description": "Schema defining supported extra specs for the Garm AWS Provider",
"properties": {
"subnet_id": {
"type": "string",
"pattern": "^subnet-[0-9a-fA-F]{17}$"
},
"ssh_key_name": {
"type": "string",
"description": "The name of the Key Pair to use for the instance."
},
"iops": {
"type": "integer",
"description": "Specifies the number of IOPS (Input/Output Operations Per Second) provisioned for the volume. Required for io1 and io2 volumes. Optional for gp3 volumes."
},
"throughput": {
"type": "integer",
"maximum": 1000,
"minimum": 125,
"description": "Specifies the throughput (MiB/s) provisioned for the volume. Valid only for gp3 volumes."
},
"volume_size": {
"type": "integer",
"description": "Specifies the size of the volume in GiB."
},
"volume_type": {
"type": "string",
"enum": [
"gp2",
"gp3",
"io1",
"io2",
"st1",
"sc1",
"standard"
],
"description": "Specifies the EBS volume type."
},
"security_group_ids": {
"type": "array",
"description": "The security groups IDs to associate with the instance. Default: Amazon EC2 uses the default security group.",
"items": {
"type": "string"
}
},
"disable_updates": {
"type": "boolean",
"description": "Disable automatic updates on the VM."
},
"enable_boot_debug": {
"type": "boolean",
"description": "Enable boot debug on the VM."
},
"extra_packages": {
"type": "array",
"description": "Extra packages to install on the VM.",
"items": {
"type": "string"
}
},
"runner_install_template": {
"type": "string",
"description": "This option can be used to override the default runner install template. If used, the caller is responsible for the correctness of the template as well as the suitability of the template for the target OS. Use the extra_context extra spec if your template has variables in it that need to be expanded."
},
"extra_context": {
"type": "object",
"description": "Extra context that will be passed to the runner_install_template.",
"additionalProperties": {
"type": "string"
}
},
"pre_install_scripts": {
"type": "object",
"description": "A map of pre-install scripts that will be run before the runner install script. These will run as root and can be used to prep a generic image before we attempt to install the runner. The key of the map is the name of the script as it will be written to disk. The value is a byte array with the contents of the script."
}
},
"additionalProperties": false
}
An example extra specs json would look like this:
{
"subnet_id":"subnet-0e7a29d5cf6e54789",
"ssh_key_name":"Garm-test",
"iops": 3000,
"throughput": 200,
"volume_size": 50,
"volume_type": "gp3",
"security_group_ids": ["sg-018c35963edfb1cce", "sg-018c35963edfb1cee"],
"disable_updates": true,
"enable_boot_debug": true,
"extra_context": {
"GolangDownloadURL": "https://go.dev/dl/go1.22.4.linux-amd64.tar.gz"
},
"extra_packages": [
"apg",
"tmux"
],
"pre_install_scripts": {
"01-script": "IyEvYmluL2Jhc2gKCgplY2hvICJIZWxsbyBmcm9tICQwIiA+PiAvMDEtc2NyaXB0LnR4dAo=",
"02-script": "IyEvYmluL2Jhc2gKCgplY2hvICJIZWxsbyBmcm9tICQwIiA+PiAvMDItc2NyaXB0LnR4dAo="
},
"runner_install_template": "#!/bin/bash

set -e
set -o pipefail

{{- if .EnableBootDebug }}
set -x
{{- end }}

CALLBACK_URL="{{ .CallbackURL }}"
METADATA_URL="{{ .MetadataURL }}"
BEARER_TOKEN="{{ .CallbackToken }}"

if [ -z "$METADATA_URL" ];then
	echo "no token is available and METADATA_URL is not set"
	exit 1
fi

function call() {
	PAYLOAD="$1"
	[[ $CALLBACK_URL =~ ^(.*)/status(/)?$ ]] || CALLBACK_URL="${CALLBACK_URL}/status"
	curl --retry 5 --retry-delay 5 --retry-connrefused --fail -s -X POST -d "${PAYLOAD}" -H 'Accept: application/json' -H "Authorization: Bearer ${BEARER_TOKEN}" "${CALLBACK_URL}" || echo "failed to call home: exit code ($?)"
}

function systemInfo() {
	if [ -f "/etc/os-release" ];then
		. /etc/os-release
	fi
	OS_NAME=${NAME:-""}
	OS_VERSION=${VERSION_ID:-""}
	AGENT_ID=${1:-null}
	# strip status from the callback url
	[[ $CALLBACK_URL =~ ^(.*)/status(/)?$ ]] && CALLBACK_URL="${BASH_REMATCH[1]}" || true
	SYSINFO_URL="${CALLBACK_URL}/system-info/"
	PAYLOAD="{\"os_name\": \"$OS_NAME\", \"os_version\": \"$OS_VERSION\", \"agent_id\": $AGENT_ID}"
	curl --retry 5 --retry-delay 5 --retry-connrefused --fail -s -X POST -d "${PAYLOAD}" -H 'Accept: application/json' -H "Authorization: Bearer ${BEARER_TOKEN}" "${SYSINFO_URL}" || true
}

function sendStatus() {
	MSG="$1"
	call "{\"status\": \"installing\", \"message\": \"$MSG\"}"
}

function success() {
	MSG="$1"
	ID=${2:-null}
	call "{\"status\": \"idle\", \"message\": \"$MSG\", \"agent_id\": $ID}"
}

function fail() {
	MSG="$1"
	call "{\"status\": \"failed\", \"message\": \"$MSG\"}"
	exit 1
}

# This will echo the version number in the filename. Given a file name like: actions-runner-osx-x64-2.299.1.tar.gz
# this will output: 2.299.1
function getRunnerVersion() {
	FILENAME="{{ .FileName }}"
	[[ $FILENAME =~ ([0-9]+\.[0-9]+\.[0-9+]) ]]
	echo $BASH_REMATCH
}

function getCachedToolsPath() {
	CACHED_RUNNER="/opt/cache/actions-runner/latest"
	if [ -d "$CACHED_RUNNER" ];then
		echo "$CACHED_RUNNER"
		return 0
	fi

	VERSION=$(getRunnerVersion)
	if [ -z "$VERSION" ]; then
		return 0
	fi

	CACHED_RUNNER="/opt/cache/actions-runner/$VERSION"
	if [ -d "$CACHED_RUNNER" ];then
		echo "$CACHED_RUNNER"
		return 0
	fi
	return 0
}

function downloadAndExtractRunner() {
	sendStatus "downloading tools from {{ .DownloadURL }}"
	if [ ! -z "{{ .TempDownloadToken }}" ]; then
	TEMP_TOKEN="Authorization: Bearer {{ .TempDownloadToken }}"
	fi
	curl --retry 5 --retry-delay 5 --retry-connrefused --fail -L -H "${TEMP_TOKEN}" -o "/home/{{ .RunnerUsername }}/{{ .FileName }}" "{{ .DownloadURL }}" || fail "failed to download tools"
	mkdir -p /home/{{ .RunnerUsername }}/actions-runner || fail "failed to create actions-runner folder"
	sendStatus "extracting runner"
	tar xf "/home/{{ .RunnerUsername }}/{{ .FileName }}" -C /home/{{ .RunnerUsername }}/actions-runner/ || fail "failed to extract runner"
	# chown {{ .RunnerUsername }}:{{ .RunnerGroup }} -R /home/{{ .RunnerUsername }}/actions-runner/ || fail "failed to change owner"
}

CACHED_RUNNER=$(getCachedToolsPath)
if [ -z "$CACHED_RUNNER" ];then
	downloadAndExtractRunner
	sendStatus "installing dependencies"
	cd /home/{{ .RunnerUsername }}/actions-runner
	sudo ./bin/installdependencies.sh || fail "failed to install dependencies"
else
	sendStatus "using cached runner found in $CACHED_RUNNER"
	sudo cp -a "$CACHED_RUNNER"  "/home/{{ .RunnerUsername }}/actions-runner"
	sudo chown {{ .RunnerUsername }}:{{ .RunnerGroup }} -R "/home/{{ .RunnerUsername }}/actions-runner" || fail "failed to change owner"
	cd /home/{{ .RunnerUsername }}/actions-runner
fi


sendStatus "configuring runner"
{{- if .UseJITConfig }}
function getRunnerFile() {
	curl --retry 5 --retry-delay 5 \
		--retry-connrefused --fail -s \
		-X GET -H 'Accept: application/json' \
		-H "Authorization: Bearer ${BEARER_TOKEN}" \
		"${METADATA_URL}/$1" -o "$2"
}

sendStatus "downloading JIT credentials"
getRunnerFile "credentials/runner" "/home/{{ .RunnerUsername }}/actions-runner/.runner" || fail "failed to get runner file"
getRunnerFile "credentials/credentials" "/home/{{ .RunnerUsername }}/actions-runner/.credentials" || fail "failed to get credentials file"
getRunnerFile "credentials/credentials_rsaparams" "/home/{{ .RunnerUsername }}/actions-runner/.credentials_rsaparams" || fail "failed to get credentials_rsaparams file"
getRunnerFile "system/service-name" "/home/{{ .RunnerUsername }}/actions-runner/.service" || fail "failed to get service name file"
sed -i 's/$/\.service/' /home/{{ .RunnerUsername }}/actions-runner/.service

SVC_NAME=$(cat /home/{{ .RunnerUsername }}/actions-runner/.service)

sendStatus "generating systemd unit file"
getRunnerFile "systemd/unit-file?runAsUser={{ .RunnerUsername }}" "$SVC_NAME" || fail "failed to get service file"
sudo mv $SVC_NAME /etc/systemd/system/ || fail "failed to move service file"
sudo chown root:root /etc/systemd/system/$SVC_NAME || fail "failed to change owner"
if [ -e "/sys/fs/selinux" ];then
	sudo chcon -h system_u:object_r:systemd_unit_file_t:s0 /etc/systemd/system/$SVC_NAME || fail "failed to change selinux context"
fi

sendStatus "enabling runner service"
cp /home/{{ .RunnerUsername }}/actions-runner/bin/runsvc.sh /home/{{ .RunnerUsername }}/actions-runner/ || fail "failed to copy runsvc.sh"
sudo chown {{ .RunnerUsername }}:{{ .RunnerGroup }} -R /home/{{ .RunnerUsername }} || fail "failed to change owner"
sudo systemctl daemon-reload || fail "failed to reload systemd"
sudo systemctl enable $SVC_NAME
{{- else}}

GITHUB_TOKEN=$(curl --retry 5 --retry-delay 5 --retry-connrefused --fail -s -X GET -H 'Accept: application/json' -H "Authorization: Bearer ${BEARER_TOKEN}" "${METADATA_URL}/runner-registration-token/")

set +e
attempt=1
while true; do
	ERROUT=$(mktemp)
	{{- if .GitHubRunnerGroup }}
	./config.sh --unattended --url "{{ .RepoURL }}" --token "$GITHUB_TOKEN" --runnergroup {{.GitHubRunnerGroup}} --name "{{ .RunnerName }}" --labels "{{ .RunnerLabels }}" --no-default-labels --ephemeral 2>$ERROUT
	{{- else}}
	./config.sh --unattended --url "{{ .RepoURL }}" --token "$GITHUB_TOKEN" --name "{{ .RunnerName }}" --labels "{{ .RunnerLabels }}" --no-default-labels --ephemeral 2>$ERROUT
	{{- end}}
	if [ $? -eq 0 ]; then
		rm $ERROUT || true
		sendStatus "runner successfully configured after $attempt attempt(s)"
		break
	fi
	LAST_ERR=$(cat $ERROUT)
	echo "$LAST_ERR"

	# if the runner is already configured, remove it and try again. In the past configuring a runner
	# managed to register it but timed out later, resulting in an error.
	./config.sh remove --token "$GITHUB_TOKEN" || true

	if [ $attempt -gt 5 ];then
		rm $ERROUT || true
		fail "failed to configure runner: $LAST_ERR"
	fi

	sendStatus "failed to configure runner (attempt $attempt): $LAST_ERR (retrying in 5 seconds)"
	attempt=$((attempt+1))
	rm $ERROUT || true
	sleep 5
done
set -e

sendStatus "installing runner service"
sudo ./svc.sh install {{ .RunnerUsername }} || fail "failed to install service"
{{- end}}

if [ -e "/sys/fs/selinux" ];then
	sudo chcon -R -h user_u:object_r:bin_t:s0 /home/runner/ || fail "failed to change selinux context"
fi

AGENT_ID=""
{{- if .UseJITConfig }}
sudo systemctl start $SVC_NAME || fail "failed to start service"
{{- else}}
sendStatus "starting service"
sudo ./svc.sh start || fail "failed to start service"

set +e
AGENT_ID=$(grep "agentId" /home/{{ .RunnerUsername }}/actions-runner/.runner |  tr -d -c 0-9)
if [ $? -ne 0 ];then
	fail "failed to get agent ID"
fi
set -e
{{- end}}
systemInfo $AGENT_ID

success "runner successfully installed" $AGENT_ID
{{- if .ExtraContext.GolangDownloadURL }}
curl -LO {{ .ExtraContext.GolangDownloadURL }}
rm -rf /usr/local/go && sudo tar -C /usr/local -xzf go1.22.4.linux-amd64.tar.gz
export PATH=$PATH:/usr/local/go/bin
{{- end }}"
}
NOTE: The extra_context
spec adds a map of key/value pairs that may be expected in the runner_install_template
.
The runner_install_template
allows us to completely override the script that installs and starts the runner. In the example above, I have added a copy of the current template from garm-provider-common
, with the adition of:
{{- if .ExtraContext.GolangDownloadURL }}
curl -LO {{ .ExtraContext.GolangDownloadURL }}
rm -rf /usr/local/go && sudo tar -C /usr/local -xzf go1.22.4.linux-amd64.tar.gz
export PATH=$PATH:/usr/local/go/bin
{{- end }}
NOTE: runner_install_template
is a golang template, which is used to install the runner. An example on how you can extend the currently existing template with a function that downloads, extracts and installs Go on the runner is provided above.
To set it on an existing pool, simply run:
garm-cli pool update --extra-specs='{"subnet_id":"subnet-0e7a29d5cf6e54789"}' <POOL_ID>
You can also set a spec when creating a new pool, using the same flag.
Workers in that pool will be created taking into account the specs you set on the pool.
Supported Volume Parameters for Garm AWS Provider
NOTE: The EBS Volume attached to the runner is configured to be deleted on termination and is set to have the device name
set as /dev/sda
.
-
iops
Description: Specifies the number of IOPS (Input/Output Operations Per Second) provisioned for the volume. Usage: Required for
io1
andio2
volumes. Optional forgp3
volumes, with a default of 3,000 IOPS. Not applicable forgp2
,st1
,sc1
, orstandard
volumes. Valid Ranges:gp3
: 3,000 - 16,000 IOPSio1
: 100 - 64,000 IOPSio2
: 100 - 256,000 IOPS (up to 32,000 IOPS on non-Nitro instances) Notes: Forgp2
, IOPS represents baseline performance and burst credit accumulation. -
throughput
Description: Specifies the throughput (MiB/s) for the volume. Usage: Valid only for
gp3
volumes. Not applicable forgp2
,io1
,io2
,st1
,sc1
, orstandard
volumes. Valid Range: 125 - 1,000 MiB/s -
volume_size
Description: Specifies the size of the volume in GiB. Usage: Required unless a snapshot ID is provided. Must be equal to or larger than the snapshot size if specified. Valid Ranges by Volume Type:
gp2 and gp3
: 1 - 16,384 GiBio1
: 4 - 16,384 GiBio2
: 4 - 65,536 GiBst1 and sc1
: 125 - 16,384 GiBstandard
: 1 - 1,024 GiB -
volume_type
Description: Specifies the EBS volume type. Supported Values:
gp2
: General-purpose SSD with baseline and burstable IOPS.gp3
: Next-generation SSD with configurable IOPS and throughput.io1
: High-performance SSD for critical workloads, requiring IOPS specification.io2
: High-performance SSD with enhanced durability, requiring IOPS specification.st1
: Throughput-optimized HDD for large sequential workloads.sc1
: Cold HDD for less-frequently accessed workloads.standard
: Magnetic storage for infrequent access. Default:gp2
Note: Ensure your instance type supports the IOPS and throughput configurations specified. For instance types built on the Nitro system, higher IOPS and throughput limits are supported. For more details on volume types and their use cases, refer to the Amazon EBS User Guide.