Saturday, November 23, 2024
HomeKubernetesK8s TroubleshootingHow to Troubleshoot DaemonSet?

How to Troubleshoot DaemonSet?

We have seen about Kubernetes DaemonSet recently and in this will see how we can troubleshoot daemonset, if there is any issue on it.

What is a Kubernetes DaemonSet?

The DaemonSet feature is used to ensure that some or all of your pods are scheduled and running on every single available node in the Kubernetes cluster. This essentially runs a copy of the desired pod across all nodes.

Whenever there is new node has been added to a Kubernetes cluster, a new pod will be added to that new node. Similarly, when a node got removed, the DaemonSet controller ensures that the pod associated with that node is garbage collected.

DaemonSets are an integral part of the Kubernetes cluster facilitating administrators to easily configure services (pods) across all or a subset of nodes.

How do DaemonSets Work?

A DaemonSet is an active Kubernetes object managed by a controller. You can declare any state you want, indicating that a particular Pod should exist on all nodes. The tuning control loop compares the desired state to the currently observed state. If the monitored node does not have a matching pod, the DaemonSet controller will create one for you.

This automated process includes existing nodes and all newly created nodes. Pods created by the DaemonSet controller are ignored by the Kubernetes scheduler as long as they exist as nodes themselves.

DaemonSet creates pods on every node by default. If desired, you can use the node selector to limit the number of nodes it can accept. The DaemonSet controller only creates pods on nodes that match the predefined nodeSelector field in the YAML file.

Create a DaemonSet

To create a DaemonSet, you need to define a YAML manifest file and run it in the cluster using kubectl apply. The DaemonSet YAML file specifies the pod template that should be used to run pods on each node. It can also specify conditions or tolerations that determine when DaemonSet pods can schedule on nodes.

Here is an example of a DaemonSet manifest file. The example was shared in the Kubernetes documentation.

apiVersion: apps/v1
kind: DaemonSet
metadata:
  name: fluentd-elasticsearch
  namespace: kube-system
  labels:
    k8s-app: fluentd-logging
spec:
  selector:
    matchLabels:
      name: fluentd-elasticsearch
  template:
    metadata:
      labels:
        name: fluentd-elasticsearch
    spec:
      tolerations:
      - key: node-role.kubernetes.io/master
        operator: Exists
        effect: NoSchedule
      containers:
      - name: fluentd-elasticsearch
        image: quay.io/fluentd_elasticsearch/fluentd:v2.5.2
        resources:
          limits:
            memory: 200Mi
          requests:
            cpu: 100m
            memory: 200Mi
        volumeMounts:
        - name: varlog
          mountPath: /var/log
        - name: varlibdockercontainers
          mountPath: /var/lib/docker/containers
          readOnly: true
      terminationGracePeriodSeconds: 30
      volumes:
      - name: varlog
        hostPath:
          path: /var/log
      - name: varlibdockercontainers
        hostPath:
          path: /var/lib/docker/containers

First, let’s create the DaemonSet using the kubectl create command and retrieve the DaemonSet and pod information as follows:

# kubectl create -f daemonset-example.yaml

# kubectl get daemonset

# kubectl get pod -o wide

As you can see from the above output, our DaemonSet has been successfully deployed. Depending on the nodes available on the cluster, it will scale automatically to match the number of nodes or a subset of nodes on the configuration.

Troubleshooting DaemonSet

A DaemonSet is unhealthy if it does not have one pod running per node. Use the following steps to diagnose and resolve the most common DaemonSet issues.

However, note that DaemonSet troubleshooting can get complex, and issues can involve multiple parts of your Kubernetes environment. For complex troubleshooting scenarios, you will need to use specialized tools to diagnose and resolve the problem. You can refer our Troubleshooting article or series for more details.

1. List Pods in the DaemonSet

Run this command to see all the pods in the DaemonSet:

# kubectl get pod -l app={{ label }} -n {{ namespace }}

Identify which of the pods has a status of crashloopbackoff, pending, or evicted. For any pods that seem to be having issues, run this command to get more information about the pod:

# kubectl describe pod {{ pod-name }} -n {{ namespace }}

Or use this command to get logs for the pod:

# kubectl logs {{ pod-name }} -n {{ namespace }}

2. Check if Pods are Running Out of Resources

A common cause of CrashLoopBackOff or scheduling issues on the nodes is the lack of resources available to run the pod.

To identify which node the pod is running on, run this command:

# kubectl get pod {{ pod-name }} -o wide

To view currently available resources on the node, get the node name from the previous command and run:

# kubectl top node {{ node-name }}

Use the following strategies to resolve the issue:

  • Reduce the requested CPU and memory of the DaemonSet.
  • Move some pods off the relevant nodes to free up resources.
  • Scale nodes vertically, for example by upgrading them to a bigger compute instance.
  • Use taints and tolerations in the DaemonSet manifest to prevent the DaemonSet from running on certain nodes which do not have sufficient resources to run the pod.

If it is not essential to run exactly one pod per node, consider using a Deployment object instead. This will give you more control over the number and locations of pods running.

3. Debug Container Issues

If pods are running properly, there may be an issue with an individual container inside the pod. The first step is to check which image is specified in the DaemonSet manifest and make sure it is the right image.

If it is, bash into the container by gaining shell access to the node and using this command (for a Docker container):

# docker run -ti --rm {{ image }} /bin/bash

Try to identify if there are application errors or configuration issues preventing the container from running properly.

RELATED ARTICLES
- Advertisment -

Most Popular

Recent Comments