Home

Awesome

logo

terraform-aws-jenkins-ha-agents

verson build license pr

A module for deploying Jenkins in a highly available and highly scalable manner.

Related blog post can be found on the Neiman Marcus Medium page.

Features

Terraform & Module Version

Terraform 0.13 - Pin module version to ~> v3.0. Submit pull-requests to master branch.

Terraform 0.12 - Pin module version to ~> v2.0. Submit pull-requests to terraform12 branch. Only bug fixes will be accepted. All new developement will be on Terraform 0.13.

Terraform 0.11 - Deprecated in this module.

Usage

To be used with a local map of tags.

Minimum Configuration

module "jenkins_ha_agents" {
  source  = "neiman-marcus/jenkins-ha-agents/aws"
  version = "x.x.x"

  admin_password  = "foo"
  bastion_sg_name = "bastion-sg"
  domain_name     = "foo.io."

  private_subnet_name = "private-subnet-*"
  public_subnet_name  = "public-subnet-*"

  r53_record = "jenkins.foo.io"
  region     = "us-west-2"

  ssl_certificate = "*.foo.io"
  ssm_parameter   = "/jenkins/foo"

  tags     = local.tags
  vpc_name = "prod-vpc"
}

Full Configuration with Custom Userdata and Plugins

main.tf

module "jenkins_ha_agents" {
  source  = "neiman-marcus/jenkins-ha-agents/aws"
  version = "x.x.x"

  admin_password    = "foo"
  agent_max         = 6
  agent_min         = 2
  agent_volume_size = 16

  ami_name          = "amzn2-ami-hvm-2.0.*-x86_64-gp2"
  ami_owner         = "amazon"
  api_ssm_parameter = "/api_key"

  auto_update_plugins_cron = "0 0 31 2 *"

  efs_mode                   = "provisioned"
  efs_provisioned_throughput = 3

  application     = "jenkins"
  bastion_sg_name = "bastion-sg"
  domain_name     = "foo.io."

  agent_lt_version  = "$Latest"
  master_lt_version = "$Latest"

  key_name          = "foo"
  scale_down_number = -1
  scale_up_number   = 1

  custom_plugins              = templatefile("init/custom_plugins.cfg",{})
  extra_agent_userdata        = data.template_file.extra_agent_userdata.rendered
  extra_agent_userdata_merge  = "list(append)+dict(recurse_array)+str()"
  extra_master_userdata       = data.template_file.extra_master_userdata.rendered
  extra_master_userdata_merge = "list(append)+dict(recurse_array)+str()"

  retention_in_days = 90

  executors                = 4
  instance_type_controller = ["t3a.2xlarge"]
  instance_type_agents     = ["t3a.xlarge", "t3.xlarge", "t2.xlarge"]
  jenkins_version          = "2.249.1"
  password_ssm_parameter   = "/admin_password"

  cidr_ingress        = ["0.0.0.0/0"]
  private_subnet_name = "private-subnet-*"
  public_subnet_name  = "public-subnet-*"

  r53_record      = "jenkins.foo.io"
  region          = "us-west-2"
  ssl_certificate = "*.foo.io"

  ssm_parameter = "/jenkins/foo"
  swarm_version = "3.23"
  tags          = local.tags
  vpc_name      = "prod-vpc"
}

data "template_file" "custom_plugins" {
  template = file("init/custom_plugins.cfg")
}

data "template_file" "extra_agent_userdata" {
  template = file("init/extra_agent_userdata.cfg")

  vars {
    foo = "bar"
  }
}

data "template_file" "extra_master_userdata" {
  template = file("init/extra_master_userdata.cfg")

  vars {
    foo = "bar"
  }
}

init/custom_plugins.cfg

---
#cloud-config

write_files:
  - path: /root/custom_plugins.txt
    content: |
      cloudbees-folder
    permissions: "000400"
    owner: root
    group: root

init/extra_agent_userdata.cfg

---
runcmd:
  - echo 'foo = ${foo}'

init/extra_master_userdata.cfg

---
runcmd:
  - echo 'foo = ${foo}'

Examples

Inputs

NameDescriptionTypeDefaultRequired
admin_passwordThe master admin password. Used to bootstrap and login to the master. Also pushed to ssm parameter store for posterity.stringN/Ayes
agent_lt_versionThe version of the agent launch template to use. Only use if you need to programatically select an older version of the launch template. Not recommended to change.string$Latestno
agent_maxThe maximum number of agents to run in the agent ASG.int6no
agent_minThe minimum number of agents to run in the agent ASG.int2no
agent_volume_sizeThe size of the agent volume.int16no
ami_nameThe name of the amzn2 ami. Used for searching for AMI id.stringamzn2-ami-hvm-2.0.*-x86_64-gp2no
ami_ownerThe owner of the amzn2 ami.stringamazonno
api_ssm_parameterThe path value of the API key, stored in ssm parameter store.string/api_keyno
applicationThe application name, to be interpolated into many resources and tags. Unique to this project.stringjenkinsno
auto_update_plugins_cronCron to set to auto update plugins. The default is set to February 31st, disabling this functionality. Overwrite this variable to have plugins auto update.string0 0 31 2 *no
bastion_sg_nameThe bastion security group name to allow to ssh to the master/agents.stringN/Ayes
cidr_ingressIP address cidr ranges allowed access to the LB.string["0.0.0.0/0"]no
custom_pluginsCustom plugins to install when bootstrapping. Created from a template outside of the module.stringemptyno
domain_nameThe root domain name used to lookup the route53 zone information.stringN/Ayes
efs_modeThe EFS throughput mode. Options are bursting and provisioned. To set the provisioned throughput in mibps, configure efs_provisioned_throughput variable.stringburstingno
efs_provisioned_throughputThe EFS provisioned throughput in mibps. Ignored if EFS throughput mode is set to bursting.int3no
executorsThe number of executors to assign to each agent. Must be an even number, divisible by two.int4no
extra_agent_userdataExtra agent user-data to add to the default built-in. Created from a template outside of the module.stringemptyno
extra_agent_userdata_mergeControl how cloud-init merges custom agent user-data sections.stringlist(append) + dict(recurse_array) + str()no
extra_master_userdataExtra master user-data to add to the default built-in. Created from a template outside of the module.stringemptyno
extra_master_userdata_mergeControl how cloud-init merges custom master user-data sections.stringlist(append) + dict(recurse_array) + str()no
instance_type_controllerThe type of instances to use for controller autoscaling group (ASG)listt3a.xlargeno
instance_type_agentsThe type of instances to use for agent's autoscaling group (ASG)"listt3.xlarge, t3a.xlarge, t2.xlarge, t2a.xlargeno
jenkins_versionThe version number of Jenkins to use on the master. Change this value when a new version comes out, and it will update the launch configuration and the autoscaling group.string2.249.1no
key_nameSSH Key to launch instances.stringnullno
master_lt_versionThe version of the master launch template to use. Only use if you need to programatically select an older version of the launch template. Not recommended to change.string$Latestno
password_ssm_parameterThe path value of the master admin passowrd, stored in ssm parameter store.string/admin_passwordno
private_subnet_nameThe name prefix of the private subnets to pull in as a data source.stringN/Ayes
public_subnet_nameThe name prefix of the public subnets to pull in as a data source.stringN/Ayes
r53_recordThe FQDN for the route 53 record.stringN/Ayes
regionThe AWS region to deploy the infrastructure too.stringN/Ayes
retention_in_daysHow many days to retain cloudwatch logs.int90no
scale_down_numberNumber of agents to destroy when scaling down.int-1no
scale_up_numberNumber of agents to create when scaling up.int1no
ssl_certificateThe name of the SSL certificate to use on the load balancer.stringN/Ayes
ssm_parameterThe full ssm parameter path that will house the api key and master admin password. Also used to grant IAM access to this resource.stringN/Ayes
swarm_versionThe version of swarm plugin to install on the agents. Update by updating this value.int3.23no
tagstags to define locally, and interpolate into the tags in this module.stringN/Ayes
vpc_nameThe name of the VPC the infrastructure will be deployed to.stringN/Ayes

Outputs

NameDescription
agent_asg_nameThe name of the agent asg. Use for adding to addition outside resources.
agent_iam_roleThe agent IAM role attributes. Use for attaching additional iam policies.
master_asg_nameThe name of the master asg. Use for adding to addition outside resources.
master_iam_roleThe master IAM role name. Use for attaching additional iam policies.
r53_record_fqdnThe fqdn of the route 53 record.

Known Issues/Limitations

N/A

Notes

Breaking Changes

v2.5.0

v2.1.0

How it works

The architecture, on the surface, is simple, but has a lot of things going on under the hood. Similar to a basic web-application architecture, a load balancer sits in front of the master auto scaling group, which connects directly to the agent autoscaling group.

Master Node Details

The Master node sits in an autoscaling group, using the Amazon Linux 2 AMI. The autoscaling group is set to a minimum and maximum of one instance. The autoscaling group does not scale out or in. It can be in one of two availability zones. It is fronted by an ELB which can control the autoscaling group based on a health check. If port 8080 is not functioning properly, the ELB will terminate the instance.

The name of the master autoscaling group is identical to the master launch configuration. This is intentional. If the launch configuration is updated, the master autoscaling group will be recreated with the new launch configuration.

Data are persisted through an EFS volume, with a mount target in each availability zone.

During initial launch, the master will generate an API key and publish it to SSM Parameter store.

Agent Nodes Details

Agent nodes are also set in an autoscaling group, using the Amazon Linux 2 AMI, set in the same availability zones.

Agents connect to the master node through the Jenkins SWARM plugin. The agents are smart enough to get the master's IP address using the AWS CLI and API key from the parameter store. Agents launch, configure themselves, and connect to the master. If agents cannot connect or get disconnected, the agent will self-terminate, causing the autoscaling group to create a new instance. This helps in the case that the agents launch, and the master has not yet published the API key to the parameter store. After it is published, the agents and master will sync up. If the master is terminated, the agents will automatically terminate.

Agents are spot instances, keeping cost down.

Agent Scaling Details

Agents scale based on CPU, and on the Jenkins build queue. The master node will poll itself to see how many executors are busy and send a CloudWatch metric alarm. If the number of executors available is less than half, then the autoscaling group will scale up. If executors are idle, then the agents will scale down. This is configured in the cloud-init user data.

Updating Jenkins/SWARM Version

To update Jenkins or the SWARM plugin, update the variable in the terraform.tfvars files and redeploy the stack. The master will rebuild with the new version of Jenkins, maintaining configuration on the EFS volume. The agents will redeploy with the new version of SWARM.

Auto Updating Plugins

The master has the ability to check for plugin updates, and automatically install them. By default, this feature is disabled. To enable it, set the auto_update_plugins_cron argument. Finally, it saves the list of plugins, located in /var/lib/jenkins/plugin-updates/archive for further review. You are encouraged to use something like AWS Backup to take daily backups of your EFS volume, and set the cron to a time during a maintenance window.

Diagram

Diagram

FAQ

Why not use ECS or Fargate?

ECS still requires managing instances with an autoscaling group, in addition to the ECS containers and configuration. Just using autoscaling groups is less management overhead.

Fargate cannot be used with the master node as it cannot currently mount EFS volumes. It is also more costly than spot pricing for the agents.

Why not use a plugin to create agents?

The goal is to completely define the deployment with code. If a plugin is used and configured for agent deployment, defining the solution as code would be more challenging. With the SWARM plugin, and the current configuration, the infrastructure deploys instances, and the instance user data connects. The master is only used for scaling in and out based on executor load.

Possible Improvements

Below are a list of possible improvements identified. Please feel free to develop and test. These may or may not be implemented.

Authors

Conduct / Contributing / License

Acknowledgments