Thursday, November 21, 2024
HomeKubernetesKubernetes Taints and Tolerations - Explained

Kubernetes Taints and Tolerations – Explained

Orchestrators like Kubernetes have abstracted servers away, and now you can manage your whole infrastructure in a multi-tenant, heterogeneous Kubernetes cluster. You package your application in containers, and then Kubernetes takes care of availability, scaling, monitoring, and more across various nodes featuring specialized hardware or logical isolation.

Kubernetes has a lot of options and flexibility depending on what you need from it. One such piece of functionality is the concept of taints and tolerations, which helps you achieve selective scheduling.

In this article will see about taints and tolerations with some example.

Why Use Taints and Tolerations?

Taints and tolerations help prevent your pods from scheduling to undesirable nodes. Let’s assume, you want to run your selected label pods or your UI pods on a particular node. For this case, taints and tolerations make this possible.

What is Scheduling?

In Kubernetes, scheduling isn’t about timing, but about ensuring that pods are matched to nodes. When we create a pod, the scheduler in the control plane looks at the nodes and verifies available resources and other conditions before assigning a pod to the nodes.
If there are no errors during the verification, the pod will be scheduled on the node. If the conditions of the verification aren’t satisfied, then your pods will be put in a Pending state, you may experience sometime.

You can use kubectl describe pods name_of_pod and scroll down to Events to find the precise reasons for the pending state or you can run kubectl get events. Incase if there is resource issue, you may get some error on it, like it can be CPU issue or memory or sometime storage.

If there were another node with adequate resources present, the scheduler would have scheduled the pod to it. Again pods goes to pending state for varies reason, we need to understand from the events or logs and should take the necessary actions.

kubernetes taint and toleration

How Do Taints Work?

The scheduler looks at the nodes, and if there are taints that the pods can’t tolerate, it doesn’t schedule the pod to that node.

Similar to labels, there can be many taints on your nodes, and the cluster’s scheduler will schedule a pod to your nodes only if it tolerates all of the taints.

You can add taint to your nodes with the following command:

# kubectl taint nodes <node name> <taint key>=<taint value>:<taint effect>

Here, nodeName is the name of the node that you want to taint, and the taint is described with the key-value pair. In the above example, value1 is the key and taint-effect is the value.

The taints can produce three different outcomes depending on your taint-effect choice.

Taint effects define what will happen to pods if they don’t tolerate the taints. The three taint effects are:

  • NoSchedule: A strong effect where the system lets the pods already scheduled in the nodes run, but enforces taints from the subsequent pods.
  • PreferNoSchedule: A soft effect where the system will try to avoid placing a pod that does not tolerate the taint on the node.
  • NoExecute: A strong effect where all previously scheduled pods are evicted, and new pods that don’t tolerate the taint will not be scheduled.

The three taint effects can be seen here:

# kubectl taint nodes node1 key1=value1:NoSchedule
# kubectl taint nodes node1 key1=value1:NoExecute
# kubectl taint nodes node1 key2=value1:PreferNoSchedule

How Do Tolerations Work?

Taints don’t allow pods to schedule on nodes with the set key-value property, but how will you schedule a pod to these nodes with taints?

That’s where tolerations come in. They help you schedule pods on the nodes with the taints. Tolerations are applied to your pods’ manifests in the following format:

tolerations:
- key: "key1"
  operator: "Equal"
  value: "value1"
  effect: "NoExecute"
  tolerationSeconds: 3600

If the taints and tolerations match, the pods can be scheduled on the tainted nodes, but there’s not a requirement that the pods be scheduled on the tainted nodes. If there’s a node without taints, a pod with tolerations can be scheduled on that node, even if there’s also an available node with tolerable taints.

Let’s check with some examples: Let’s add tolerations to the pods so that they will not get repelled from the tainted node. Tolerations are specified in PodSpec in the following formats depending on the operator.

Equal Operator

tolerations:
- key: "<taint key>"
  operator: "Equal"
  value: "<taint value>"
  effect: "<taint effect>"

Exists Operator

tolerations:
- key: "<taint key>"
  operator: "Exists"
  effect: "<taint effect>"

The Equal operator requires the taint value and will not match if the value is different. Yet, the Exists operator will match any value as it only considers if the taint is defined regardless of the value.

As shown below, we can use the equal operator for the gpu=true:NoSchedule taint defined earlier.

apiVersion: v1
kind: Pod
metadata:
  name: nginx-test
  labels:
    env: test-env
spec:
  containers:
  - name: nginx
    image: nginx:latest
    imagePullPolicy: IfNotPresent
    resources:
      requests:
        memory: "128Mi"
        cpu: "250m"
      limits:
        memory: "512Mi"
        cpu: "500m"
  tolerations:
  - key: "gpu"
    operator: "Equal"
    value: "true"
    effect: "NoSchedule"

The diagram below details our configuration so far. We have two pods with different tolerations and different taint keys. Only the pod with the matching toleration, which is the gpu (taint key), will be allowed on the node.

taint and toleration

Additional toleration formats

In some cases, you may need to match any taint or any value of a taint key. You can meet this requirement using the exists operator.

Match any Taint

As shown below, we can match any taint in a node by simply defining the Exists operator without a key, value, or effect.

tolerations:
  operator: "Exists"

Match any value of a Taint Key

We can match any value and effect in any node with the specified taint key by defining the operator and the key, as shown below.

tolerations:
  operator: "Exists"
  key: "<taint key>"

Using multiple taints and tolerations

Kubernetes supports multiple taints and tolerations on nodes and pods. This feature allows Kubernetes to process these taints and tolerations as a filter. It will look at all the available taints, ignore the ones that have a matching toleration, and apply the effect of the non-matching taint.

To further explain how this process takes effect, assume we have the following three taints on one of our nodes.

# kubectl taint nodes vps2 gpu=true:NoSchedule
# kubectl taint nodes vps2 project=system:NoExecute
# kubectl taint nodes vps2 type=process:NoSchedule

Then, we create a pod but put only two taints to match.

apiVersion: v1
kind: Pod
metadata:
  name: nginx-test
  labels:
    env: test-env
spec:
  containers:
  - name: nginx
    image: nginx:latest
    imagePullPolicy: IfNotPresent
    resources:
      requests:
        memory: "128Mi"
        cpu: "250m"
      limits:
        memory: "512Mi"
        cpu: "500m"
  tolerations:
  - key: "gpu"
    operator: "Equal"
    value: "true"
    effect: "NoSchedule"
  - key: "project"
    operator: "Equal"
    value: "system"
    effect: "NoSchedule"

In this situation, there’s no matching taint for the third taint, while the first two taints have matching tolerations. Therefore, the pod will get repelled from this node, and the pods within the node will continue to run without interruption as the third taint effect is NoSchedule. However, all pods that do not tolerate taint will get evicted immediately if the unmatched taint has the NoExecute effect.

NoExecute effect

The NoExecute effect evicts pods from a node if they do not tolerate a specific taint. Users can define how long a pod stays within a node when there is a matching taint by using the optional tolerationSeconds field in the tolerations section for NoExecute. It ensures that the pod remains within the node for the specified period before being evicted. The NoExecute get applied according to the following rules:

  • Pods without the matching tolerations get evicted immediately.
  • Pods that have matching toleration and have specified tolerationSeconds field will remain within the node for the specified time before getting evicted. The pod will not get removed if the taint is removed before the time expires.
  • Pods that have matching tolerations and without the tolerationSeconds field will continue to live within the node unless manually removed.

Take a look at the following pod configuration with the effect NoExecute and tolerationSeconds set to 3600 seconds. When a matching taint is added to the node the Pod will only stay within the node for 3600 seconds before getting evicted. If the taint is removed before the time specified in tolerationSeconds the Pod will not get evicted from the node.

apiVersion: v1
kind: Pod
metadata:
  name: nginx-test
  labels:
    env: test-env
spec:
  containers:
  - name: nginx
    image: nginx:latest
    imagePullPolicy: IfNotPresent
    resources:
      requests:
        memory: "128Mi"
        cpu: "250m"
      limits:
        memory: "512Mi"
        cpu: "500m"
  tolerations:
  - key: "cpu"
    operator: "Equal"
    value: "true"
    effect: "NoExecute"
    tolerationSeconds: 3600

Use Cases for Taints and Tolerations

Now, you may understand what is taints and tolerations are, now you may surprise where to use this or when to use or how to use this, right? There are many ways, and in this, will go through the four most prominent use cases.

Master Node Taints

The control plane components are hosted on your master node, and you don’t want your application to interfere with the core processes for Kubernetes. Kubernetes, by default, taints your master node with NoSchedule to prevent it from crashing under high loads. If your workers crash, new processes can be spawned, but there’s no such luxury if your master crashes, because the master controls everything else in your cluster.

You can remove the taints, but that’s not recommended in production environments. The command to remove is:

# kubectl taint nodes nodename key-

Dedicated Nodes for Specific Users

For some business requirements, you need logical isolation for your pods. An example of this would be using different nodes for your internal tools, customers, or different teams in your organization. You can achieve this with the help of taints and tolerations.

# kubectl taint nodes nodename group=groupName:NoSchedule

After the taints are applied, you can instruct your teams to use specific tolerations for their pods, ensuring that their pods can be scheduled to the correct nodes. As long as all of your nodes are tainted, this is an error-free way to segregate specific teams into specific nodes.

If there’s a possibility of nodes without taints existing in your cluster, you can use NodeAffinity or NodeSelectors to make sure pods are scheduled to the desired nodes.

Nodes with Special Hardware

Earlier, you read about scheduling for limited resources. But what if you require special hardware for your nodes? For example, how can you prevent pods that don’t need GPU from monopolizing resources on an expensive virtual machine with specialized hardware? Taints and tolerations are the answer to this problem.

# kubectl taint nodes nodename gpu=true:NoSchedule

With the above taint on the nodes, and respective toleration applied to your pods, you can make sure your specialized resources (like GPUs) are utilized by the pods that require them. The tolerations for this use case would look like this:

apiVersion: v1
kind: Pod
metadata:
  name: gpupod
  labels:
    env: prod
spec:
  containers:
  - name: AiModels
    image: aicompany/aimodel
    imagePullPolicy: IfNotPresent
  tolerations:
  - key: "gpu"
    operator: "Equal"
    value: "true"
    effect: "NoSchedule"

Taint-Based Eviction

As previously stated, the master node manages all other components in the cluster, and has a taint applied to it by default.

In some scenarios, the Kubernetes node controller automatically adds NoExecute taint to a node. The applied taint either evicts the pods immediately, or “drains” all of the pods from the node and schedules them to different available nodes, depending upon the deployment object.

A good example is when a node has high disk utilization. When this happens, a taint is added to your node so that no further pods are scheduled. This process is automatically done via the node controller in the control plane, and no manual intervention is required.

The other conditions where Kubernetes will add taints automatically are as follows:

  • node.kubernetes.io/not-ready: This indicates that the node is not ready, and the NodeCondition Ready is False.
  • node.kubernetes.io/unreachable: This indicates that the node controller is unable to reach the node, and the NodeCondition Ready is Unknown.
  • node.kubernetes.io/memory-pressure: The node is running out of available memory.
  • node.kubernetes.io/disk-pressure: The node is running out of disk space, or is using disk space at an unexpected rate.
  • node.kubernetes.io/pid-pressure: The node is running out of process IDs. If this happens, the node will be unable to start any new processes.
  • node.kubernetes.io/network-unavailable: The node’s network is unavailable.
  • node.kubernetes.io/unschedulable: The node is not schedulable.
  • node.cloudprovider.kubernetes.io/uninitialized: This taint is applied to mark a node started with an external cloud provider as unusable. It’s removed when a controller from the cloud-controller-manager initializes the node.

Best practices of Taints and Tolerations

  • Keep it simple—Do not overcomplicate taint keys and values. Always try to keep them short and simple, which will help maintain them in the long term.
  • Check for matching taints and tolerations before using NoExecute—Ensure the required pods within the specified node have matching taints and tolerations before implementing the NoExecute effect.
  • Monitor cluster costs—Overuse of taints and tolerations can interfere with the scheduler, impacting the efficiency of the cluster and resulting in higher costs. Kubecost allows administrators to monitor and manage costs within the cluster and optimize the scheduling properly.
  • Use node affinity to schedule pods on specific nodes—Taints and tolerations should not be used as a mechanism to schedule pods in specific nodes. This application may be viable in smaller clusters where repelled pods will get scheduled in any other available node. Yet, it should be implemented using node affinity combined with taints and tolerations.

Taints and tolerations aren’t that useful without NodeSelectors and NodeAffinity properties. We can say, if we use all required properties, managing pod will be too simple.

RELATED ARTICLES
- Advertisment -

Most Popular

Recent Comments