Sergio Matone

Posted on Jun 17

Arc Tideline: ephemeral GitHub Actions runners powered by Karpenter

#devops #cicd #githubactions #karpenter

The problem: Long-running GHA workflows

As a DevOps engineer, I have been maintaining multiple repositories heavily relying on GitHub Actions workflows to fulfill the CI/CD pipeline in terms of linting, testing, benchmarking, releasing of binaries and baking of OCI images.
Some of these workflows are pretty straightforward and their execution last quickly. But many of them are performing heavy tasks, such as a certain number of e2e tests or complex compiling tasks to build OCI images or releasing binaries.
When the latter kind of tasks are related to merging on the main branch, this is particularly annoying because the merge will not be concluded until all the workflows have finished, and allowing them to finish later, makes their purpose totally useless.

On the other hand GitHub Runners, which by default run these tasks, are slow and downsized machines for their general purpose nature. But GitHub offers the possibility to employ self-hosted runners, meaning machines of any kind and size which the user can instantiate and run on his own and which can receive heavy or longer workflows.

We started with this solution at first, but it has a relevant shortcoming: the machine should be always up to receive workflows, anytime a PR would be merged in any repo, but also when none is merging anything. Since this kind of runners were intended to make room for heavy-loaded tasks, this practically means having a quite big machine always up and reachable, eventually foreseeing relevant cost for that.

The solution: ARC

But GitHub Runners of course are not only intended as pure server machines always up. Indeed GitHub delivers the Actions Runner Controller (ARC): a Kubernetes operator that orchestrates and scales self-hosted runners for GitHub Actions.

In practice we could just create and run self-hosted runners within a Kube cluster which will be ephemeral, so they spawn on-demand without the need to be always up.

Note: here we will refer to the new mode of running arc, using Autoscaling Runner Scale Sets mode, not the old method based on the legacy Controller and RunnerDeployments

This simply requires installing two Helm releases on a Kube cluster:

the Runner Scale Set Controller chart, which installs the CRDs and runs the reconcile loops. It watches each Runner Scale Set, talks to the GitHub Actions service, and creates/destroys ephemeral runner pods on demand.
the Runner Scale Set chart, which creates a self-managed, ephemeral, autoscaling group of runners that registers with GitHub and picks up jobs targeted at it.

These new items require just a few changes to an existing setup:

A new or existing Kubernetes Cluster where deploying the ARC Helm Charts
Modifying the runs-on directive into the jobs of the target flows, using as value the name of the Helm release for Runner Scale Set.

jobs:
  build:
    runs-on: custom-ci-self-runner

Granting access to the target repositories by setting up a valid GH authentication method and configuring it into the Scale Set Helm Chart. The allowed methods are:
- GitHub PAT (classic or fine-grained)
- GitHub Application

Both of them should have an explicit access to the target repositories, please reference the official doc

When using in an organization it can be very very tricky to set things up because of cross permissions that should be set up.

The new problem: Amount of nodes needed to run GHA workflows

Eventually Arc solved the main problem of having ephemeral runners which are let go when the corresponding jobs have ended. But there is still an issue to solve, indeed ARC Runner Scale Set handles autoscaling in terms of Pods, but in Kubernetes Pods run inside Nodes.

So basically we went back to the first issue: avoiding high costs for unused HW capacity. On the other hand we should also have enough room for GHA workflows Pods within the cluster.

The new solution: Karpenter - the Kubernetes node provisioner

Karpenter is an operator for Kubernetes that is able to automatically add or remove nodes into the cluster by using specific configurations.

As of today Karpenter is working mainly with EKS or AKS, some efforts are ongoing for GKE. Here I’ll reference EKS solution where I actually employ Karpenter.

The basic CRDs of Karpenter are:

EC2NodeClass: the infrastructure template for nodes; AMI, IAM role, subnets, security groups, disk.
NodePool: nodes that Karpenter may create and how it manages them; instance requirements (type/size/arch/capacity-type), scaling limits, and disruption rules (consolidation, expiry, budgets). It references a NodeClass via nodeClassRef for the actual machine details.

NodeClass and NodePool

Node Class

apiVersion: karpenter.k8s.aws/v1
kind: EC2NodeClass
metadata:
  name: my-nodeclass
spec:
  amiSelectorTerms:
    - alias: bottlerocket@latest
  role: KarpenterNodeRole-custom_ci-cluster
  subnetSelectorTerms:
    - tags:
        karpenter.sh/discovery: custom_ci-cluster
  securityGroupSelectorTerms:
    - tags:
        karpenter.sh/discovery: custom_ci-cluster

The previous defines:

bottlerocket as reference ami for new instances
a specific role KarpenterNodeRole-custom_ci-cluster for the instances
- this will be automatically created by the terraform-aws-modules when using Terraform
- the following policies will be added: AmazonEKSWorkerNodePolicy, AmazonEC2ContainerRegistryPullOnly, AmazonEKS_CNI_Policy
tags as convention to define placement (for security groups and subnets)

  karpenter.sh/discovery: custom_ci-cluster

Karpenter operator will use the SecurityGroups tagged with that key/value in order to match the Node configuration.

These tags were created upon creation of VPC and EKS cluster on Terraform.

  private_subnet_tags = {
    "kubernetes.io/role/internal-elb" = 1
    # Tags subnets for Karpenter auto-discovery
    "karpenter.sh/discovery" =  custom_ci-cluster
  }

Node Pool

apiVersion: karpenter.sh/v1
kind: NodePool
metadata:
  name: generic-large
spec:
  template:
    metadata:
      labels:
        type: karpenter
        dagger.sh/engine: "true"
    spec:
      # Reference Karpenter Node Class
      nodeClassRef:
        group: karpenter.k8s.aws
        kind: EC2NodeClass
        name:  sw360cab-nodeclass
      # instantiable Nodes requirements
      requirements:
        - key: karpenter.sh/capacity-type # Default on-demand
          operator: In
          values: ["on-demand", "spot"]
        - key: karpenter.k8s.aws/instance-category
          operator: In
          values: ["c", "m", "r"]
        - key: karpenter.k8s.aws/instance-size
          operator: In
          values: [large, xlarge]
        - key: "karpenter.k8s.aws/instance-generation" # Filter out older instance types
          operator: Gt
          values: ["4"]
        - key: kubernetes.io/arch
          operator: In
          values: ["amd64"]
        - key: kubernetes.io/os
          operator: In
          values: ["linux"]
      expireAfter: 4h
      terminationGracePeriod: 20m
  limits:
    cpu: "100"
    memory: 500Gi
  disruption:
    consolidationPolicy: WhenEmpty
    consolidateAfter: 20m
    budgets:
    - nodes: "10%"
      schedule: "30 0 * * *"
      duration: 3h

Some spec fields are defined here:

nodeClassRef: the reference to the EC2NodeClass, which supplies the AWS infra recipe for these nodes. It is extremely important to have an exact match with the name of an existing EC2NodeClass.
requirements: the constraints under which Karpenter may launch instances: on-demand or spot capacity, c/m/r categories, large/xlarge sizes, generation > 4, amd64, linux. These are extremely expressive and impact heavily on the costs and the kind of instances the node pool will schedule. Spot instance setup is surprisingly performant!

Be sure to check out the Karpenter reference doc for node pool configuration

expireAfter: every node is force-retired 4 hours after creation
terminationGracePeriod: max time a node may drain before Karpenter forcibly deletes it. It caps how long jobs can block termination.

Beyond the template, the pool sets cluster-wide bounds and disruption behavior:

limits: the ceiling on total provisioned capacity across all nodes in this pool. Karpenter stops launching new nodes once the sum of their CPU/memory hits these values.
disruption: governs how Karpenter voluntarily removes nodes. Disruption policies control how Karpenter can voluntarily reshape the cluster over time; consolidating workloads onto fewer or cheaper nodes, retiring nodes once they expire, and replacing those that have drifted from their desired spec. Disruption budgets keep this in check, limiting how many nodes Karpenter may disturb at once and when, so optimization never comes at the cost of stability. A budget can cap disruptions by a node count or percentage, restrict them to a recurring time window via a schedule, and even apply only to specific reasons such as emptiness, underutilization, or drift.

The disruption policy definition can result extremely tricky to define and understand, please refer to the official doc or ask AI for help defining it the way you expect.

Putting all together

Using DevSpace and a loop

After provisioning the full cluster including Karpenter with Terraform, I wanted a way to quickly deploy resources on the cluster without using fancy bash scripts, which are always a reliable fallback plan. I do prefer something mostly declarative and less error prone to changes compared to a bash script.

I was used to employ Google Skaffold, but often I felt it lacked flexibility in expression, so I have rather used Devspace, an open-source dev tool for Kubernetes.

It allowed me to define:

a basic pipeline with Helm Charts to instantiate: Arc Controller and Karpenter
a dynamic deployment list (though that’s imperative and in Bash, ouch!), to achieve multiple ARC Runner Scale Sets, which correspond to multiple NodePools, according to the requirements of the workflow itself

  #!/bin/bash
  set -e

  echo "Purging multiple Arc Runner releases"

  for size_env in $(printenv | grep '^ARC_RUNNER_SIZE_'); do
    size="$(echo $size_env | cut -d= -f2-)"
    release_name="custom-ci-self-runner-${size}"

    echo ""
    echo "==============================="
    echo "Removing Release: $release_name"
    echo "==============================="

    helm uninstall ${release_name} --namespace arc-runner --wait
  done
  echo "Successfully removed Arc Runner helm releases"

The latter is the key of the whole project, multiple Runner Scale Set will be scheduled and they will be responsible of receiving specific GHA target jobs and running them in Pods. These Pods are scheduled onto Nodes that in turn are dynamically spawned and disrupted by Karpenter.

Here we defined the different sets according to estimated workflows’ size: medium, large, x-large and their corresponding capacity requirements. In terms of GHA workflows this only means changing the runs-on line with the corresponding Runner Scale Set label.

# https://github.com/sw360cab/arc-tideline/blob/master/.github/workflows/arc-job.yaml
name: Actions Runner Controller Demo
on:
  workflow_dispatch:
jobs:
  Explore-GitHub-Actions:
    runs-on: custom-ci-self-runner-medium
    steps:
    - run: echo "🎉 This job uses runner scale set runners!"

Adding more tools to the Nodes

A GHA workflow can contain basically any kind of job execution, from linting to deployments. Often specific tools are used and installed. This is the case of Dagger which can be recalled within a job by referring to the Dagger CLI. The latter in turn auto-provisions its own Dagger Engine, leveraging Docker and starting a dagger-engine container.

However with this configuration there is another possibility: a Kubernetes Node can be automatically able to run a single Dagger Engine, which is referenced by any Pod running a Dagger function and scheduled on that Node.

template:
  spec:
    initContainers:
    - name: dagger-cli
      image: alpine:3
      command:
        - sh
        - -o
        - pipefail
        - -exc
        - |-
          # sleep infinity
          apk add curl
          if [ ! -f $BIN_DIR/dagger ]
          then
            if ! curl --fail --silent --show-error https://dl.dagger.io/dagger/install.sh | sh; then
              echo "Dagger CLI install failed"
              exit 1
            fi
            $BIN_DIR/dagger version
          fi
      env:
        - name: BIN_DIR
          value: /opt/dagger/bin
        - name: DAGGER_VERSION
          value: {{ .Values.daggerVersion | default "latest" }}
      volumeMounts:
        - name: dagger-cli
          mountPath: /opt/dagger/bin
    containers:
    - name: runner
      image: ghcr.io/actions/actions-runner:2.333.0
      command:
        - bash
        - -exc
        - |-
          sudo cp /opt/dagger/bin/dagger /bin/dagger
          sudo apt-get update
          sudo apt-get install -y --no-install-recommends git-core curl
          sudo rm -rf /var/lib/apt/lists/*
          exec /home/runner/run.sh
      env:
        - name: _EXPERIMENTAL_DAGGER_RUNNER_HOST
          value: unix:///var/run/dagger/engine.sock
      volumeMounts:
        - name: dagger-cli
          mountPath: /opt/dagger/bin
        - name: dagger-engine
          mountPath: /var/run/dagger
    volumes:
    - name: dagger-cli
      emptyDir: {}
    # Dagger engine installed from Dagger's Helm chart,
    # which is configured to use hostPath for run volume
    - name: dagger-engine
      hostPath:
        path: /var/run/dagger-dagger-engine-dagger-helm

Here is how:

the Dagger Engine is automatically installed as DaemonSet, so each Node will run it. This applies only to the Kubernetes Nodes matching the nodeSelector expression (key: dagger.sh/engine, operator: Exists)
the Dagger CLI is automatically installed on each GHA Runner Scale Set, using the initContainers in the template section of the values made available by the Helm Chart itself.
the Dagger Engine is then referenced within the Pod running the Dagger CLI using
- a Volume pointing to an hostPath: /var/run/dagger-dagger-engine-dagger-helm
- an Env Var having name _EXPERIMENTAL_DAGGER_RUNNER_HOST and value unix:///var/run/dagger/engine.sock
eventually the Dagger CLI launched within the GHA workflow will discover this variable and avoid the installation of a new Dagger Engine together with the Dagger CLI

The nodeSelector tolerations defined in the Helm release of the Dagger Engine are matched within any NodePool by defining a node label of type dagger.sh/engine: "true". This way only Nodes with this label will allow the DaemonSet of the Dagger Engine to be installed in the node itself at boot time.

Conclusions

At the end all the repositories I maintain can now have GHA workflows running onto ephemeral self-hosted runners; these are executed on Pods scheduled on dynamic and ephemeral Nodes in a Kubernetes cluster.

With these simple configurations I achieved a performant solution for using GitHub Actions on self-hosted runners. This is not only highly adaptable and configurable, but it also made ephemeral runners able to run on nodes that are instantiated on-demand by Karpenter and then destroyed when useless by the same policy, maximizing cost savings as well.

Please visit the Arc Tideline repo and feel free to use it or fork.

References

Arc Tideline repository
actions/actions-runner-controller: Kubernetes controller for GitHub Actions self-hosted runners
Karpenter
On-Demand Dagger Engines with Argo CD, EKS, and Karpenter | Dagger Blog - The blog post inspiring this work.

DEV Community