15 min read

Trusting Harbor's corporate CA in Tanzu Kubernetes Grid with Carvel tools

Trusting Harbor's corporate CA in Tanzu Kubernetes Grid with Carvel tools

So you have deployed a Tanzu Community Edition workload cluster! Well, I bet one of the first few things that you did was installing some packages, but something tells me that Harbor was the third one you installed, only after the two other packages that it requires. For those who do not know what Harbor is, it is the default container registry solution that Tanzu Community Edition offers as a package installation.

These days I am spending a lot of time improving my automations around Tanzu Community Edition’s deployment and second-day operations, which ultimately forces me to constantly create and destroy my clusters. Even though I initially built some functions to be run after the clusters get created, it is a bit tidious to run these scripts each time a cluster gets deployed. For that reason, I decided to look more into how objects are deployed at cluster creation. Although that is a topic I would like to expand upon in another blogpost, this time we will be tweaking our tanzu configuration so that new (and existing) clusters can seamlessly use our private container registry, which in most cases will be Harbor, deployed as a Tanzu package. Since Tanzu Community Edition uses Cluster API and Carvel tools for its own automations, why not leverage those same components and tools to our advantage? This enables us to automate some first and second-day operations the "Tanzu way".

But what do I really mean by "Tanzu way"? Tanzu Kubernetes Grid uses the Carvel tools for diverse operations such as image building, templating or continuous application reconcilliation, among others. Among the different Carvel tools, the YAML Templating Tool, or ytt, is a very powerful tool that enables you to automate YAML templates by allowing you to write logic directly in YAML configuration files. In combination with Cluster API, it allows you to bake some of your first and second-day operations into your cluster configuration, which then gets executed at cluster creation, reducing the amount of code you have to run once your cluster has been created.

In this post I will demonstrate how to fully integrate a Harbor deployment with your clusters, by means of ytt overlays.

Now there are two very likely scenarios and a third one which is slightly bit more advanced:

  • You want to be able to create container images in your jumpbox and push them into your harbor registry.
  • You want to be able to use spin containers, based on those images, in your Tanzu workload clusters.
  • You might even want to create your own Carvel Package Repositories, or deploy third-party applications as Tanzu packages.

For that reason, three different components need to trust Harbor’s corporate CA:

  • The jumpbox(es) used for development, so you can push/pull images to/from Harbor.
  • The cluster nodes, so you can run containers that use those images.
  • The kapp controller, so you can deploy kapp applications using local Carvel Package Repositories.

Trusting Harbor’s corporate CA from your linux jumpbox

Since we intend to upload container images to our private registry, we will first need our current machine to trust Harbor’s corporate CA. In order to do so, you execute the following function, which extracts the CA, and adds it in your linux machine on /usr/local/share/ca-certificates/-ca.crt.

function trustHarborCA-this-machine() {
# EXAMPLE USAGE: trustHarborCA-this-machine 'harbor.yourdomain.com'
# SOURCE: https://tanzucommunityedition.io/docs/latest/package-readme-harbor-2.2.3/
    local harbor_url=$1
# Make docker trust the Harbor CA (on this machine)
    sudo mkdir -p /etc/docker/certs.d
    sudo mkdir -p /etc/docker/certs.d/$harbor_url
    sudo chmod +777 /etc/docker/certs.d/$harbor_url/
    curl -sk https://$harbor_url/api/v2.0/systeminfo/getcert > ~/$harbor_url-ca.crt
    sudo cp ~/$harbor_url-ca.crt /etc/docker/certs.d/$harbor_url/ca.crt
# Make this machine trust the Harbor CA
    sudo cp ~/$harbor_url-ca.crt /usr/local/share/ca-certificates/$harbor_url-ca.crt
    sudo update-ca-certificates
    rm ~/$harbor_url-ca.crt
}

After defining the function, export the URL of your harbor registry as an environment variable, and run the function as such:

$ export harbor_url=<HARBOR_URL>           # Your harbor server (e.g. harbor.yourdomain.com)
$ trustHarborCA-this-machine $harbor_url   # Add harbor's corporate CA to your jumpboxe's list of trusted certificates

To verify that that our machine does now indeed trust the harbor’s corporate CA, we will push an image to it.

$ docker login $harbor_url                                    # You will be requested to input your admin credentials
$ docker pull nginx                                           # Pull nginx from dockerhub
$ docker tag nginx:latest $harbor_url/library/nginx:latest    # Tag your image
$ docker push $harbor_url/library/nginx                       # Push the image to your local harbor registry

Congratulations! You can now push/pull container images to/from your private Harbor registry from your jumpbox. Nevertheless, if you want to run workloads on your Tanzu clusters with those images (which I figure you do), there are a couple more steps to be taken.

Adding Harbor’s corporate CA to the cluster config

At the moment, we have not added the corporate CA in the cluster configuration, so, what will happen if we try to run a pod directly? Only one way to find out! Let’s just execute:

$ kubectl run nginx --image $harbor_url/library/nginx

And now let’s look at the status of the pod by running kubectl get pod nginx -w
You will first see a status of ContainerCreating until you get an ErrImagePull which will eventually just turn into an Error. Let’s look at the pod details:

$ kubectl describe pod nginx
Failed to pull image with "x509: certificate signed by unknown authority"

As expected, the cluster was not able to pull the image, since your nodes do not trust your Harbor’s corporate CA at an OS level. Let’s stop for a second and take a look at what happens under the hood when you run a pod in a Tanzu Kubernetes Cluster.

  • You send a request to Kube API server by executing kubectl run <EXAMPLE_POD> --image example-registry.domain.com/<Project>/<Image_Name>:<Image_Version>, indicating you want to run a workload with a certain container image.
  • After executing its algorithm to decide which node the pod will be deployed to, the Kube Scheduler reports back to the Kube API server indicating the node where the pod will run.
  • The pod gets registered and the Containerd daemon of the cluster node(s) will try to pull the image from the registry. The node is essentially running crictl pull <IMAGE_NAME>.
  • The pod gets created, and if all probes are passed, it gets a RUNNING status.

As indicated above, each node where a pod has to run will pull the container image by itself, without intervention of the Kube API server. It is for that reason, that the nodes themselves must trust the registry’s certificate authority. Since the cluster nodes are ultimately Linux VMs, all we need is for the .pem certificate to be present at the /etc/ssl/certs/ directory. We will achieve this by adding a ytt overlay.

YTT overlays are used to patch YAML at runtime instead of needing to template it, which enables you to edit the default deployment of a Tanzu Kubernetes Cluster by allowing you to add (almost) any customization you want. In our case, we need to add a ytt overlay which will ensure that the Harbor CA cert will be written into the cluster nodes’ filesystem. When looking into achieving this, I was redirected by my fellow Tanzu Community Edition member and guru, Scott Rosemberg to Toshiaki Maki’s github, specifically to this page, where Toshiaki also explains the process in detail.

After seeing how Toshiaki’s solution worked, but was not fully automated, I saw an opportunity to contribute and build on top of his work. Given that ubuntu and photon nodes deal with certificates in a different way, Toshiaki proposed two different overlays, one for each operating system. Initially, I built a function with an if statement that would check the OS_NAME in the cluster configuration files, and subsequently create the overlay. Since the cluster’s OS_NAME, and other parameters, can be defined at different levels (in ~/.config/tanzu/tkg/config.yaml to use the same OS for all clusters, or in ~/.config/tanzu/tkg/clusterconfigs/_cluster_config.yaml to choose the OS per cluster), making the function dependent on the location of the cluster configuration file posed some extra challenges.

After realizing that both overlays were almost identical, only differing in the command used for updating/rehashing the certificates I decided to bake that logic into the overlay itself. Since ytt knows all parameters in the cluster configuration files at cluster creation, I just added this section:

#@ if data.values.OS_NAME == "photon":
#@overlay/append
- '! which rehash_ca_certificates.sh 2>/dev/null || rehash_ca_certificates.sh'
#@ else:
#@overlay/append
- '! which update-ca-certificates 2>/dev/null || (mv /etc/ssl/certs/harbor-ca.pem /usr/local/share/ca-certificates/tkg-custom-ca.crt && update-ca-certificates)'
#@ end

This reduces the dependencies on the jumpbox’s filesystem, as well as trimming the function by half, and being able to trust Harbor’s corporate CA with only one overlay (regardless of the node’s operating system). With all of that said, you can use the following function to add the Harbor corporate CA to your cluster configuration. First define it:

$ function add-harbor-cert-to-cluster-config() {
# EXAMPLE USAGE: add-harbor-cert-to-cluster-config 'harbor.yourdomain.com'
# SOURCE: https://github.com/making/blog.ik.am/blob/master/content/00675.md
    local harbor_url=$1
# Extract Harbor's certificate and add it to the ytt provider configuration
    mkdir -p ${HOME}/.config/tanzu/tkg/providers/ytt/04_user_customizations/harbor
    curl -sk https://$harbor_url/api/v2.0/systeminfo/getcert > ${HOME}/.config/tanzu/tkg/providers/ytt/04_user_customizations/harbor/harbor-ca.pem
# Create a yaml overlay which will refer to the Harbor CA
    cat > ${HOME}/.config/tanzu/tkg/providers/ytt/04_user_customizations/harbor/custom-ca.yaml <<EOF
#@ load("@ytt:overlay", "overlay")
#@ load("@ytt:data", "data")
#@ load("@ytt:yaml", "yaml")
#! Trust your custom CA certificates on all Control Plane nodes.
#@overlay/match by=overlay.subset({"kind":"KubeadmControlPlane"})
---
spec:
kubeadmConfigSpec:
#@overlay/match missing_ok=True
files:
#@overlay/append
- content: #@ data.read("harbor-ca.pem")
owner: root:root
permissions: "0644"
path: /etc/ssl/certs/harbor-ca.pem
#@overlay/match missing_ok=True
preKubeadmCommands:
#@ if data.values.OS_NAME == "photon":
#@overlay/append
- '! which rehash_ca_certificates.sh 2>/dev/null || rehash_ca_certificates.sh'
#@ else:
#@overlay/append
- '! which update-ca-certificates 2>/dev/null || (mv /etc/ssl/certs/harbor-ca.pem /usr/local/share/ca-certificates/tkg-custom-ca.crt && update-ca-certificates)'
#@ end
#! Trust your custom CA certificates on all worker nodes.
#@overlay/match by=overlay.subset({"kind":"KubeadmConfigTemplate"}), expects="1+"
---
spec:
template:
spec:
#@overlay/match missing_ok=True
files:
#@overlay/append
- content: #@ data.read("harbor-ca.pem")
owner: root:root
permissions: "0644"
path: /etc/ssl/certs/harbor-ca.pem
#@overlay/match missing_ok=True
preKubeadmCommands:
#@ if data.values.OS_NAME == "photon":
#@overlay/append
- '! which rehash_ca_certificates.sh 2>/dev/null || rehash_ca_certificates.sh'
#@ else:
#@overlay/append
- '! which update-ca-certificates 2>/dev/null || (mv /etc/ssl/certs/harbor-ca.pem /usr/local/share/ca-certificates/tkg-custom-ca.crt && update-ca-certificates)'
#@ end
#@ kapp_controller = lambda i,left,right: left["metadata"]["name"].endswith("-kapp-controller-addon")
#@ secret = overlay.subset({"kind": "Secret"})
#@overlay/match by=overlay.and_op(secret, kapp_controller), expects="1+"
---
stringData:
#@overlay/replace via=lambda left, right: left.replace("config: {}", right)
#@yaml/text-templated-strings
values.yaml: |
config: {caCerts: "(@= data.read("harbor-ca.pem").replace("\n", "\n") @)"}
EOF
}

And then, run it:

$ add-harbor-cert-to-cluster-config $harbor_url

This function works by copyng the CA as a .pem file into the ~/.config/tanzu/tkg/providers/ytt/04_user_customizations/ folder, where all user customizations should be added. Additionally, and more importantly, it generates a YTT overlay that will be picked up by tanzu CLI at cluster creation, injecting the certificate into any new cluster nodes you create (on the node’s filesystem at ‘/etc/ssl/certs/tkg-custom-ca.pem’), allowing Containerd to interact with the registry.

You can check that both the certificate and the ytt overlay have been created and stored in the correct location by running:

cat ${HOME}/.config/tanzu/tkg/providers/ytt/04_user_customizations/harbor/harbor-ca.pem
cat ${HOME}/.config/tanzu/tkg/providers/ytt/04_user_customizations/harbor/custom-ca.yaml

So, that was it, the nodes of the future clusters you create will trust Harbor’s corporate CA, allowing you to store your custom-made (or third party) images and ultimately deploy workloads that use them.

Adding Harbor’s corporate CA to the kapp-controller config

While the previous scenarios might cover most cases, there is a third case which is worth looking into. I am talking about building your own Carvel Package Repositories, and making them available through Harbor. This allows you, among other things, to install and manage your applications as Tanzu packages.

To achieve this, we need to modify the kapp-controller’s configuration. The kapp-controller is the continuous delivery and package management component of a Tanzu Kubernetes Grid cluster, which means it will constantly try to reconcile the package installations with the desired state. It does so as well with any repository you add. It is for that reason that, if you store your repository in Harbor, the kapp-controller must trust the registry’s certificate authority. Even though the cluster nodes trust the CA, the kapp-controller, which is deployed as a pod in your TKC, doesn’t.

Digging into the kapp documentation, I found the bit where it explains how to add trusted certificates to your kapp-controller. This is done in the controller configuration spec. According to the documentation, you could accomplish this by just creating a secret, called ‘kapp-controller-config’, which contains the certificate under .stringData.caCerts. Even though this is what the official kapp documentation states, and a very valid solution, it is a perhaps a good idea to just get the kapp-controller to trust the Harbor CA by mounting the certificate into the pod itself. But, how do we configure this? Scott Rosenberg proposes this YTT overlay, which you can add by running the function below:

Note: You must have run the previous function and your Harbor CA certificate must be present in ~/.config/tanzu/tkg/providers/ytt/04_user_customizations/harbor/harbor-ca.pem.

function add-harbor-corporateCA-kapp-controller() {
# EXAMPLE USAGE: add-harbor-corporateCA-kapp-controller
# SOURCE: https://github.com/making/blog.ik.am/blob/master/content/00675.md
    cat  > ${HOME}/.config/tanzu/tkg/providers/ytt/02_addons/kapp-controller/add_kapp-controller.yaml <<EOF
#@ load("@ytt:overlay", "overlay")
#@ load("@ytt:template", "template")
#@ load("@ytt:data", "data")
#@ load("@ytt:yaml", "yaml")
#@ load("@ytt:base64", "base64")
#@ load("/lib/helpers.star", "ValuesFormatStr")
#@ load("kapp-controller_overlay.lib.yaml", "kappcontrolleroverlay")
#@ load("kapp-controller_addon_data.lib.yaml", "kappcontrollerdatavalues")
#@ load("/vendir/kapp-controller/kapp-controller.lib.yaml", "kapp_controller_lib")

#@ if data.values.PROVIDER_TYPE != "tkg-service-vsphere" and data.values.TKG_CLUSTER_ROLE != "workload":

#@ if data.values.DISABLE_CRS_FOR_ADDON_TYPE and "addons-management/kapp-controller" in data.values.DISABLE_CRS_FOR_ADDON_TYPE:
--- #@ template.replace(overlay.apply(kapp_controller_lib.with_data_values(kappcontrollerdatavalues()).eval(), kappcontrolleroverlay()))

#@ else:
---
apiVersion: addons.cluster.x-k8s.io/v1beta1
kind: ClusterResourceSet
metadata:
name: #@ "{}-kapp-controller".format(data.values.CLUSTER_NAME)
labels:
cluster.x-k8s.io/cluster-name: #@ data.values.CLUSTER_NAME
annotations:
tkg.tanzu.vmware.com/addon-type: "addons-management/kapp-controller"
spec:
strategy: "ApplyOnce"
clusterSelector:
matchLabels:
tkg.tanzu.vmware.com/cluster-name: #@ data.values.CLUSTER_NAME
resources:
- name: #@ "{}-kapp-controller-crs".format(data.values.CLUSTER_NAME)
kind: Secret
---
apiVersion: v1
kind: Secret
metadata:
name: #@ "{}-kapp-controller-crs".format(data.values.CLUSTER_NAME)
annotations:
tkg.tanzu.vmware.com/addon-type: "addons-management/kapp-controller"
type: addons.cluster.x-k8s.io/resource-set
stringData:
value: #@ yaml.encode(overlay.apply(kapp_controller_lib.with_data_values(kappcontrollerdatavalues()).eval(), kappcontrolleroverlay()))

#@ end
#@ end

#@ if data.values.PROVIDER_TYPE != "tkg-service-vsphere" and data.values.TKG_CLUSTER_ROLE == "workload":

---
apiVersion: v1
kind: Secret
metadata:
name: #@ "{}-kapp-controller-addon".format(data.values.CLUSTER_NAME)
namespace: #@ data.values.NAMESPACE
labels:
tkg.tanzu.vmware.com/addon-name: "kapp-controller"
tkg.tanzu.vmware.com/cluster-name: #@ data.values.CLUSTER_NAME
clusterctl.cluster.x-k8s.io/move: ""
annotations:
tkg.tanzu.vmware.com/remote-app: "true"
tkg.tanzu.vmware.com/addon-type: "addons-management/kapp-controller"
type: tkg.tanzu.vmware.com/addon
stringData:
values.yaml: #@ ValuesFormatStr.format(yaml.encode(kappcontrollerdatavalues()))
#@yaml/text-templated-strings
overlay.yaml: |
#@ load("@ytt:overlay", "overlay")
#@ load("@ytt:data", "data")

    #@overlay/match by=overlay.subset({"kind": "Deployment", "metadata":{"name":"kapp-controller"}}),expects=1
    ---
    spec:
      template:
        spec:
          containers:
          #@overlay/match by="name"
          - name: kapp-controller
            volumeMounts:
            #@overlay/append
            - mountPath: /etc/ssl/certs/ca-certificates.crt
              name: ca-cert
              readOnly: true
              subPath: ca.crt
          volumes:
          #@overlay/append
          - name: ca-cert
            secret:
              secretName: kapp-controller-ca-certs
              defaultMode: 420
    ---
    apiVersion: v1
    kind: Secret
    metadata:
      name: kapp-controller-ca-certs
      namespace: tkg-system
    type: Opaque
    data:
      ca.crt: (@= base64.encode(data.read("/04_user_customizations/harbor/harbor-ca.pem")) @)
#@ end
#@ end
EOF
}

Execute it:

$ add-harbor-corporateCA-kapp-controller 

From now on, you will be able to make your own package repositories available to your new clusters through your Harbor registry. Scott’s helm-to-carvel-conversion-tool is a great tool for this last use-case. I won’t go into detail, but it is a tool that converts Helm repositories into Carvel Package Repositories. Go check it out, it is awesome!

Applying the new configurations to existing clusters.

It is important to note that, up to this point, all these last two functions have done is retrieving the corporate CA from your Harbor deployment, and store it together with two pre-configured ytt overlays in your jumpbox’s filesystem, under ~/.config/tanzu/tkg/providers/ytt, in the 02_addons and the 04_user_customization directories. Those overlays will be then picked up by the Cluster API controller when creating any new cluster. But, what if you want to deploy images from our harbor registry into already existing clusters?

Since we already wrote all new configuration, we just need for our existing clusters to get re-deployed so that the nodes get re-created, but this time including the certificate. We do so by simply patching the machinedeployment, which is the Cluster API object that contains the cluster nodes’ configuration. The next function will do exactly that for you, by regenerating the config file for the existing cluster, and then patching it.

Note: This will fail if your cluster config file is not located in the following folder, with the this (standard) name convention: ~/.config/tanzu/tkg/clusterconfigs/_cluster_config.yaml.

So first define the function:

$ function patch-TKC() {
# EXAMPLE USAGE: patch-TKC 'tkg-services' 'mgmt'
# SOURCE: https://github.com/making/blog.ik.am/blob/master/content/00675.md
    local cluster_name=$1
    local management_cluster_name=$2
    initial_context=$(kubectl config current-context)
# Check whether the cluster has a static IP (static_vsphere_control_plane_endpoint), and store it in a variable. Needed to temporarily replace it by a dummy to create a patch file.
    static_vsphere_control_plane_endpoint=$(sudo cat ~/.config/tanzu/tkg/clusterconfigs/"$cluster_name"_cluster_config.yaml | yq '.VSPHERE_CONTROL_PLANE_ENDPOINT')
    if [ "$static_vsphere_control_plane_endpoint" != null ]; then
    sudo cp ~/.config/tanzu/tkg/clusterconfigs/"$cluster_name"_cluster_config.yaml ~/"$cluster_name"_cluster_config-for-patch.yaml
# Asign a different IP address to the static_vsphere_control_plane_endpoint so that the cluster config creation does not fail
    sudo yq e -i ".VSPHERE_CONTROL_PLANE_ENDPOINT = \"1.1.1.1\"" ~/"$cluster_name"_cluster_config-for-patch.yaml
    tanzu cluster create $cluster_name -f ~/"$cluster_name"_cluster_config-for-patch.yaml --dry-run > ~/patch.yaml
    yes | rm ~/"$cluster_name"_cluster_config-for-patch.yaml
# Once you have your patch.yaml file, the static_vsphere_control_plane_endpoint gets assigned its original IP address
    sed -i "s/1.1.1.1/$static_vsphere_control_plane_endpoint/g" ~/patch.yaml
# If static_vsphere_control_plane_endpoint was null, you can create a patch file directly from your cluster_config file
    else
    tanzu cluster create $cluster_name -f ~/.config/tanzu/tkg/clusterconfigs/"$cluster_name"_cluster_config.yaml --dry-run > ~/patch.yaml; 
    fi
# Change the context to your management cluster, where you can patch your workload cluster deployments
    kubectl config use-context $management_cluster_name-admin@$management_cluster_name
# View the differences between the current live object and the newly generated "dry-run" yaml and apply those changes
    kubectl diff -f ~/patch.yaml
    kubectl apply -f ~/patch.yaml
    kubectl patch machinedeployment $cluster_name-md-0 --type merge -p "{\"spec\":{\"template\":{\"metadata\":{\"annotations\":{\"date\":\"`date +'%s'`\"}}}}}"
    rm ~/patch.yaml
# Return to the initial context
kubectl config use-context $initial_context
}

If the cluster configuration contains a VSPHERE_CONTROL_PLANE_ENDPOINT variable, the dry run operation would fail if it recognizes that that vSphere control plane IP is already in use. To get around this issue, the script stores the VSPHERE_CONTROL_PLANE_ENDPOINT (if existing) into an environment variable, replaces it by a different IP address (in this case 1.1.1.1), which allows to create a patch file for the existing cluster. The function will then replace the IP address we just modified by the original one. Once the patch file is ready, it gets applied and the machine deployment gets subsequently patched.

After defining the function, patch the clusters by running patch-TKC '<WORKLOAD_CLUSTER_NAME>' '<MGMT_CLUSTER_NAME>'

Once the function has been executed, you can check that the certificate is now present any cluster node:

$ ssh capv@<NODE_IP_ADDRESS> "cat /etc/ssl/certs/tkg-custom-ca.pem"

We can now run a pod using the nginx image we pushed to Harbor earlier:

$ kubectl run nginx --image $harbor_url/library/nginx
pod/nginx created

Verify that your pod is in a RUNNING state. If it is, the nodes of your existing cluster now trust Harbor’s corporate CA. If you also want to check whether the certificate is also present in the kapp-controller, I will refer again to the helm-to-carvel-conversion-tool. Follow the well-explained instructions there to verify.

Last words

It is very important to mention that, on production environments, you should use your usual trusted certificate authority to generate certificates for your package deployments, or configure your TLS traffic with Let’s Encrypt. You can do so by combining Contour with Cert-Manager to provision Let’s Encrypt TLS certificates to your workloads.
The steps on how to do that are described here.