Kubernetes Labels and Annotations – When to Use with Best Practices

0
1571
Kubernetes Labels and Annotations - When to Use with Best Practices

Kubernetes has many moving parts, and it is essential to wrap your head around quite a few of them if you want to work within Kubernetes efficiently. One of these important aspects is “metadata,” namely labels, and annotations. These two types of metadata each have their role to play when configuring and working with Kubernetes, whether it is stitching multiple resources together or just providing some more context for developers and DevOps/SRE engineers.

In this article, let’s see the different types of metadata in action and understand how to work with them and best practices.  

Kubernetes Labels

Kubernetes labels are the metadata information attached to the Kubernetes resources to group, view, and operate. Labels are in the format of key and value string pairs, where each key should be unique.

When creating a new label, you must comply with the restrictions Kubernetes places on the length and allowed values. A label value must:

  • contain 63 characters or less (a label’s value can also be empty),
  • start and end with an alphanumeric character (unless it’s empty),
  • only include dashes (-), underscores (_), dots (.), and alphanumerics.

Lets check one of the AKS cluster node,

# kubectl get node aks-agentpool-32451571-vmss00000a -o json | jq .metadata.labels
{
  "agentpool": "agentpool",
  "beta.kubernetes.io/arch": "amd64",
  "beta.kubernetes.io/instance-type": "Standard_B4ms",
  "beta.kubernetes.io/os": "linux",
  "failure-domain.beta.kubernetes.io/region": "eastus",
  "failure-domain.beta.kubernetes.io/zone": "0",
  "kubernetes.azure.com/agentpool": "agentpool",
  "kubernetes.azure.com/cluster": "MC_foxutech_aaks_eastus",
  "kubernetes.azure.com/kubelet-identity-client-id": "5ce7d595-99b9-4122-80af-ed14f5215924",
  "kubernetes.azure.com/mode": "system",
  "kubernetes.azure.com/node-image-version": "AKSUbuntu-1804gen2containerd-202303.13.0",
  "kubernetes.azure.com/os-sku": "Ubuntu",
  "kubernetes.azure.com/role": "agent",
  "kubernetes.azure.com/storageprofile": "managed",
  "kubernetes.azure.com/storagetier": "Premium_LRS",
  "kubernetes.io/arch": "amd64",
  "kubernetes.io/hostname": "aks-agentpool-32451571-vmss00000a",
  "kubernetes.io/os": "linux",
  "kubernetes.io/role": "agent",
  "node-role.kubernetes.io/agent": "",
  "node.kubernetes.io/instance-type": "Standard_B4ms",
  "storageprofile": "managed",
  "storagetier": "Premium_LRS",
  "topology.disk.csi.azure.com/zone": "",
  "topology.kubernetes.io/region": "eastus",
  "topology.kubernetes.io/zone": "0"
}

In this command, we have retrieved the labels of the AKS node, which include information related to the operating system, hostname, and the k8s version running on the node. You can use the labels for retrieving and filtering the data from the Kubernetes API.

Let’s assume you want to get all the pods running the Kubernetes dashboard. You can use the selector app.kubernetes.io/name=argocd-redis-ha over labels with the following command:

# kubectl get pods -n argocd --selector app.kubernetes.io/name=argocd-redis-ha
NAME                       READY   STATUS    RESTARTS   AGE
argocd-redis-ha-server-0   3/3     Running   0          6m44s
argocd-redis-ha-server-1   3/3     Running   0          4m36s
argocd-redis-ha-server-2   3/3     Running   0          3m35s 

The hidden gem of Kubernetes labels is that they are heavily used with the Kubernetes itself, such as scheduling pods to nodes, managing replicas of deployments, and network routing of services.

Let’s look at some labels and how they are used as selectors in Kubernetes by checking the spec of the argocd-redis-ha service:

# kubectl get svc argocd-redis-ha -n argocd -o json| jq .spec
{
  "clusterIP": "None",
  "clusterIPs": [
    "None"
  ],
  "internalTrafficPolicy": "Cluster",
  "ipFamilies": [
    "IPv4"
  ],
  "ipFamilyPolicy": "SingleStack",
  "ports": [
    {
      "name": "tcp-server",
      "port": 6379,
      "protocol": "TCP",
      "targetPort": "redis"
    },
    {
      "name": "tcp-sentinel",
      "port": 26379,
      "protocol": "TCP",
      "targetPort": "sentinel"
    }
  ],
  "selector": {
    "app.kubernetes.io/name": "argocd-redis-ha"
  },
  "sessionAffinity": "None",
  "type": "ClusterIP"
} 

Kubernetes uses the labels defined in the selector section to distribute the incoming requests to the argocd-redis-ha service. With a similar approach, statefulset track the number of pods to maintain replicas running on the cluster. Now let’s check the selector of the statefulset for the redis:

# kubectl get sts argocd-redis-ha-server -n argocd -o json | jq .spec.selector
{
  "matchLabels": {
    "app.kubernetes.io/name": "argocd-redis-ha"
  }
} 

The matchLabels indicate that there will be enough pods with the mentioned labels in the cluster. When you release a new version, it will create a new pod-template-hash, and replica set controllers will create new pods instead.

Use cases:

  1. Group resources for object queries
  2. Bulk operations like deleting or querying selected labelled resources.
  3. Schedule pods based on node labels.

Kubernetes Annotations

Kubernetes annotations are the second way of attaching metadata to the Kubernetes resources. They are pairs of key and value strings that are like labels, but which store arbitrary non-identifying data. For instance, you can keep the contact details of the responsible people in the deployment annotations. Similarly, you can attach logging, monitoring, or auditing information for the resources in the annotations format.

The main difference between annotations and labels is that annotations are not used to filter, group, or operate on the resources. Rather, they are used to easily access additional information about the Kubernetes resources.

For instance, CRI socket or volume controller annotations show how the node works, instead of its characteristics, in the following example:

# kubectl get no aks-agentpool-32451571-vmss00000a -o json | jq .metadata.annotations
{
  "csi.volume.kubernetes.io/nodeid": "{\"disk.csi.azure.com\":\"aks-agentpool-32451571-vmss00000a\",\"file.csi.azure.com\":\"aks-agentpool-32451571-vmss00000a\"}",
  "node.alpha.kubernetes.io/ttl": "0",
  "volumes.kubernetes.io/controller-managed-attach-detach": "true"
}

Client tools and Kubernetes users can retrieve the metadata and operate accordingly. You can imagine the data kept in annotations to be stored in Excel sheets or databases; however, they are attached to the resources. Therefore, there is no selector implementation like labels in the Kubernetes API.

Best Practices

Now we have seen about the fundamentals of Kubernetes labels and annotations, it’s time to explore the best practices for using them most beneficially.

Use the Correct Syntax

Annotations and labels are key-value pairs. Keys consists of two parts: an optional (but highly suggested) prefix and name:

<prefix>

The prefix is optional; if you choose to use it, it needs to be a valid DNS subdomain (such as “cast.ai”) and have no more than 253 characters in total. Prefixes come in handy for tools and commands that aren’t private to users. They are also helpful because they let teams use multiple labels that would otherwise conflict (think of the ones in third-party packages).

Note that the kubernetes.io/ and k8s.io prefixes are reserved for Kubernetes core components.

<name>

This part refers to the arbitrary property name of the label. Teams can use the name “environment” with label values such as “production” or “testing” for clarity.

A name must meet the same requirements as the label value, but it can’t be empty. Hence, the name needs to have 63 characters or less, beginning and ending with an alphanumeric character ([a-z0-9A-Z]) with dashes (-), underscores (_), dots (.), and alphanumerics in between.

When the prefix is omitted, you can assume that labels or annotations are private for your cluster and user. When the prefix and name are used together, you should store the data to be used with multiple clients, like the following:

app.kubernetes.io/version
app.kubernetes.io/component
helm.sh/chart

Using the correct syntax for labels and annotations makes it easier to communicate within your team and use the cluster with client tools and libraries such as kubectl, Helm, and operators. Therefore, it is suggested to choose a prefix for your company and sub-prefixes for your projects. This company-wide consensus will help you utilize labels and annotations to their full power.

Understanding why and when to use Kubernetes labels

As mentioned earlier, the main difference between labels and annotations is whether they are identifiers or not. If you want to attach information to group resources and filter, you should keep the data as labels. Use annotations if the metadata is not an identifier, but rather additional data related to the Kubernetes resources.

For instance, the following pod has two labels and two annotations:

apiVersion: v1
kind: Pod
metadata:
  name: demo
  labels:
    environment: production
    app: nginx
  annotations:
     foxutech.com/owner: motoskia
     foxutech.com/country: india
spec:
  containers:
  - name: nginx
    image: nginx
    ports:
    - containerPort: 80

In the demo pod, labels classify it as being a nginx application running in production. Annotations show the owner and country data. If you plan to group pods by owners in the future, it is suggested to move foxutech.com/owner to labels.

Using labels and annotations with the correct use cases is vital to have an easy-to-operate cluster with automated tools. Therefore, ensure that your labels and annotations are not overlapping in terms of data and usage.

Exploit the Standard Labels and Annotations

Kubernetes reserves all the labels and annotations with the key kubernetes.io domain name and keeps a list of well-known ones in the official documentation. You may have seen some of them in the Kubernetes dashboard or resource definitions, such as:

 labels:
  app.kubernetes.io/name: java-app
  app.kubernetes.io/instance: m2-instance
  app.kubernetes.io/version: "2.5.6"
  app.kubernetes.io/component: jboss
  app.kubernetes.io/managed-by: helm

The main advantage of this metadata is that the Kubernetes machinery automatically fills values of the standard labels and annotations. Thus, it is suggested to use the well-known labels and annotations in your daily operations and client tools, such as Helm, Terraform, or kubectl.

Use Labels for Release Management

Releasing distributed microservices applications to the cloud is not straightforward, as you have an excessively high number of small applications—each with its own version. Therefore, most developers only change the version of a single application out of a hundred and test the rest of the system. Fortunately, you can use labels for grouping and filtering the applications running on Kubernetes.

Let’s assume you have a backend service that has multiple pods running behind it with the labels version:v1 and app:backend. You can deploy a new set of backend instances to the cluster and change the service label selector to version:v2 and app:backend. Now, all requests coming to the backend service will reach v2 instances. Luckily, switching back to v1 is pretty easy, as you only need to change the service specification.

This procedure is also known as the Blue/Green deployment strategy. In addition, you can easily implement A/B testing and canary release strategies with the help of Kubernetes labels.

How To use Labels for Troubleshooting

The last best practice is for the Kubernetes operators who need to debug applications running inside the cluster. Let’s assume you have a deployment with the following selector labels:

app.kubernetes.io/name: java-app
app.kubernetes.io/instance: m4-instance
app.kubernetes.io/version: "2.5.6"

All pods of the deployment will also have the same set of labels. Unfortunately, you cannot change and modify the pods, but you can change the labels of the selector to not match current pods. It will make the running pods orphaned, and you can exec into them for debugging.

Kubernetes will create new pods with the new labels, and your production setup will continue living as expected—with an additional pod that you’ll want to analyse further for troubleshooting. You can interfere with the operations of Kubernetes and troubleshoot your applications when you know how labels are designed and used by Kubernetes.

Cheat Sheet

Best practiceSummary
Know the syntaxKubernetes enforces syntax on label keys and values. You need to review the documentation and use them according to the standards.
Know the different label selection methodsEquality and set-based selectors allow powerful ways of grouping and operating on different resource objects.
Use the Kubernetes recommended labelsReview and apply the common labels recommended in the Kubernetes documentation.
Create an organization-wide convention for labelsLabelling requires a consistent naming convention adopted by all groups and departments. Err on the side of simplicity over complexity.
Include required labels in pod templatesInclude a subset of labels in the pod templates used by workload controllers to drive labeling consistency.
Label extensivelyTooling (such as monitoring) requires consistent labelling to be reliable. Remember that your tooling often filters by label and won’t see an unlabelled object.
Label cross-cutting concernsTo monitor resources according to structural differences, label them even if they cross-cut different concerns or share them.
Automate LabellingUse CI/CD (continuous integration and continuous delivery) tooling to automate labelling.

3 Kubernetes labelling practices to avoid

When Kubernetes labelling initiatives go awry, it is often because of one or more of the following reasons.

1. Using labels to store data that often changes

Labels shouldn’t store frequently changing data. An example is tracking the size of a database by storing the number of rows as a label. Unless that database gets updated only at fixed times, you do not want to do this.

2. Storing application-level semantics

While Kubernetes labels can join resource objects with metadata, they aren’t meant to act as a data store for applications. Because Kubernetes resources are often used for only a short time and are loosely associated with applications, labels soon become out of sync.

3. Getting loose with label names

Strict labelling conventions are a best practice for a reason. Loose label naming significantly increases the time and difficulty of querying the information you are looking for.

Hope this is useful, will meet again with different interesting topic.

Google search engine