Home

Awesome

k-andy

<img align="left" height="250" src="logo.svg"/>

Zero friction Kubernetes stack on Hetzner Cloud

This terraform module will install a High Availability K3s Cluster with Embedded DB in a private network on Hetzner Cloud. The following resources are provisionised by default (20€/mo):

</br> </br>

Note: Are you looking for the next generation API Developer Platform? 🔎 Have a look at: WunderGraph Turn your services, databases and 3rd party APIs into a secure unified API in just a few minutes. 🪄


What is K3s?

K3s is a lightweight certified kubernetes distribution. It's packaged as single binary and comes with solid defaults for storage and networking but we replaced local-path-provisioner with hetzner CSI-driver and klipper load-balancer with hetzner Cloud Controller Manager. The default ingress controller (traefik) has been disabled.

Hetzner Cloud integration:

Auto-K3s-Upgrades

Enable the upgrade-controller (enable_upgrade_controller = true) and specify your target k3s version (upgrade_k3s_target_version). See here for possible versions.

Label the nodes you want to upgrade, e.g. kubectl label nodes core-control-plane-1 k3s-upgrade=true. The concurrency of the upgrade plan is set to 1, so you can also label them all at once. Agent nodes will be drained one by one during the upgrade.

You can label all control-plane nodes by using kubectl label nodes -l node-role.kubernetes.io/control-plane=true k3s-upgrade=true. All agent nodes can be labelled using kubectl label nodes -l !node-role.kubernetes.io/control-plane k3s-upgrade=true.

To remove the label from all nodes you can run kubectl label nodes --all k3s-upgrade-.

After a successful update you can also remove the upgrade controller and the plans again, setting enable_upgrade_controller to false.

Usage

See a more detailed example with walk-through in the example folder.

<!-- BEGINNING OF PRE-COMMIT-TERRAFORM DOCS HOOK -->

Inputs

NameDescriptionTypeDefaultRequired
<a name="input_agent_groups"></a> agent_groupsConfiguration of agent groups<pre>map(object({<br> type = string<br> count = number<br> ip_offset = number<br> taints = list(string)<br> }))</pre><pre>{<br> "default": {<br> "count": 2,<br> "ip_offset": 33,<br> "taints": [],<br> "type": "cx21"<br> }<br>}</pre>no
<a name="input_cluster_cidr"></a> cluster_cidrNetwork CIDR to use for pod IPsstring"10.42.0.0/16"no
<a name="input_control_plane_already_initialized"></a> control_plane_already_initializedUse this if you have to replace the first control plane and want the primary to join other already existing ones and not do an init anymore. You have to update control_plane_primary_index to something else too.boolfalseno
<a name="input_control_plane_primary_index"></a> control_plane_primary_indexWhich of the servers should be the primary to connect to? If you change it from 1, also set control_plane_already_initialized to true. (1-indexed!)number1no
<a name="input_control_plane_server_count"></a> control_plane_server_countNumber of control plane nodesnumber3no
<a name="input_control_plane_server_type"></a> control_plane_server_typeServer type of control plane serversstring"cx11"no
<a name="input_create_kubeconfig"></a> create_kubeconfigCreate a local kubeconfig file to connect to the clusterbooltrueno
<a name="input_enable_upgrade_controller"></a> enable_upgrade_controllerInstall the rancher system-upgrade-controllerboolfalseno
<a name="input_hcloud_csi_driver_version"></a> hcloud_csi_driver_versionn/astring"v1.6.0"no
<a name="input_hcloud_token"></a> hcloud_tokenToken to authenticate against Hetzner Cloudanyn/ayes
<a name="input_k3s_version"></a> k3s_versionK3s versionstring"v1.21.3+k3s1"no
<a name="input_kubeconfig_filename"></a> kubeconfig_filenameSpecify the filename of the created kubeconfig file (defaults to kubeconfig-${var.name}.yamlanynullno
<a name="input_name"></a> nameCluster name (used in various places, don't use special chars)anyn/ayes
<a name="input_network_cidr"></a> network_cidrNetwork in which the cluster will be placed. Ignored if network_id is definedstring"10.0.0.0/16"no
<a name="input_network_id"></a> network_idIf specified, no new network will be created. Make sure cluster_cidr and service_cidr don't collide with anything in the existing network.anynullno
<a name="input_server_additional_packages"></a> server_additional_packagesAdditional packages which will be installed on node creationlist(string)[]no
<a name="input_server_locations"></a> server_locationsServer locations in which servers will be distributedlist(string)<pre>[<br> "nbg1",<br> "fsn1",<br> "hel1"<br>]</pre>no
<a name="input_service_cidr"></a> service_cidrNetwork CIDR to use for services IPsstring"10.43.0.0/16"no
<a name="input_ssh_private_key_location"></a> ssh_private_key_locationUse this private SSH key instead of generating a new one (Attention: Encrypted keys are not supported)stringnullno
<a name="input_subnet_cidr"></a> subnet_cidrSubnet in which all nodes are placedstring"10.0.1.0/24"no
<a name="input_upgrade_controller_image_tag"></a> upgrade_controller_image_tagThe image tag of the upgrade controller (See https://github.com/rancher/system-upgrade-controller/releases)string"v0.8.0"no
<a name="input_upgrade_controller_kubectl_image_tag"></a> upgrade_controller_kubectl_image_tagrancher/kubectl image tagstring"v1.21.5"no
<a name="input_upgrade_k3s_target_version"></a> upgrade_k3s_target_versionTarget version of k3s (See https://github.com/k3s-io/k3s/releases)stringnullno
<a name="input_upgrade_node_additional_tolerations"></a> upgrade_node_additional_tolerationsList of tolerations which upgrade jobs must have to run on every node (for control-plane and agents)list(map(any))[]no

Outputs

NameDescription
<a name="output_agents_public_ips"></a> agents_public_ipsThe public IP addresses of the agent servers
<a name="output_cidr_block"></a> cidr_blockn/a
<a name="output_control_planes_public_ips"></a> control_planes_public_ipsThe public IP addresses of the control plane servers
<a name="output_k3s_token"></a> k3s_tokenSecret k3s authentication token
<a name="output_kubeconfig"></a> kubeconfigStructured kubeconfig data to supply to other providers
<a name="output_kubeconfig_file"></a> kubeconfig_fileKubeconfig file content with external IP address
<a name="output_network_id"></a> network_idn/a
<a name="output_server_locations"></a> server_locationsArray of hetzner server locations we deploy to
<a name="output_ssh_private_key"></a> ssh_private_keyKey to SSH into nodes
<a name="output_subnet_id"></a> subnet_idn/a
<!-- END OF PRE-COMMIT-TERRAFORM DOCS HOOK -->

Common Operations

Agent server replacement (common case)

If you need to cycle an agent, you can do that with a single node following this procedure. Replace the group name and number with the server you want to recreate!

Make sure you drain the nodes first.

kubectl drain that-agent
terraform taint 'module.my_cluster.module.agent_group["GROUP_NAME"].random_pet.agent_suffix[1]'
terraform apply

This will recreate the agent in that group on next apply.

Sophisticated agent server replacement

If you did some weird config change or recreate them by changing the base k3s version in the terraform configuration and terraform wants to replace all your agents at once you can do this. Replacing all by one is probably not a good idea.

Example for replacement of one agent (the first one of that group):

kubectl drain that-agent
terragrunt taint 'module.agent_group["GROUP_NAME"].random_pet.agent_suffix[0]'
terraform apply --target='module.agent_group["GROUP_NAME"].hcloud_server.agent["#0"]' --target='module.agent_group["GROUP_NAME"].hcloud_server_network.agent["#0"]' --target='module.agent_group["GROUP_NAME"].random_pet.agent_suffix[0]'

Control Plane server replacement

Control plane servers do not get recreated when the user-data for cloud-init changes. If you want to recreate one after you changed something which would change the cloud-init you need to taint them.

Primary server

If you for some reason need to replace the primary control plane, you'll need to tell it to join the others.

Set the variable control_plane_primary_index to one of the other control plane nodes (e.g. 2 or 3). Also set control_plane_already_initialized to true so it won't run a cluster-init again. This will make the primary connect to control-plane 2 or 3 after recreation.

Secondary servers

This is how you can replace the servers which didn't initialize the cluster.

terraform taint 'module.my_cluster.hcloud_server.control_plane["#1"]'
terraform apply

Auto-Upgrade

Prerequisite

Install the system-upgrade-controller in your cluster.

KUBECONFIG=kubeconfig.yaml kubectl apply -f ./upgrade/controller.yaml

Upgrade procedure

  1. Mark the nodes you want to upgrade (The script will mark all nodes).
KUBECONFIG=kubeconfig.yaml kubectl label --all node k3s-upgrade=true
  1. Run the plan for the servers.
KUBECONFIG=kubeconfig.yaml kubectl apply -f ./upgrade/server-plan.yaml

Warning: Wait for completion before you start upgrading your agents.

  1. Run the plan for the agents.
KUBECONFIG=kubeconfig.yaml kubectl apply -f ./upgrade/agent-plan.yaml

Backups

K3s will automatically backup your embedded etcd datastore every 12 hours to /var/lib/rancher/k3s/server/db/snapshots/. You can reset the cluster by pointing to a specific snapshot.

  1. Stop the master server.
sudo systemctl stop k3s
  1. Restore the master server with a snapshot
./k3s server \
  --cluster-reset \
  --cluster-reset-restore-path=<PATH-TO-SNAPSHOT>

Warning: This forget all peers and the server becomes the sole member of a new cluster. You have to manually rejoin all servers.

  1. Connect you with the different servers. Backup and delete /var/lib/rancher/k3s/server/db on each server.
sudo systemctl stop k3s
rm -rf /var/lib/rancher/k3s/server/db
sudo systemctl start k3s

This will rejoin the server one after another. After some time, all servers should be in sync again. Run kubectl get node to verify it.

Info: It exists no official tool to automate the procedure. In future, rancher might provide an operator to handle this (issue).

Debugging

Cloud init logs can be found on the remote machines in:

Credits