Tuesday, November 5, 2024
HomeKubernetesK8s TroubleshootingHow to Troubleshoot Kubernetes PVC

How to Troubleshoot Kubernetes PVC

In our Previous posts, we have seen about various scenarios about how to troubleshooting Kubernetes errors, Today, will see about How to troubleshoot Kubernetes PVC with Basics and how to create a PV and PVC.

In Kubernetes, there are separate mechanisms for managing compute resources and storage resources. A storage volume is a construct that allows Kubernetes users and administrators to gain access to storage resources, while abstracting the underlying storage implementation.

Kubernetes provides two API resources that allow pods to access persistent storage:

1. PersistentVolume (PV)

A PV represents storage in the cluster, provisioned manually by an administrator, or automatically using a Storage Class. A PV is an independent resource in the cluster, with a separate lifecycle from any individual pod that uses it. When a pod shuts down, the PV remains in place and can be mounted by other pods. Behind the scenes, the PV object interfaces with physical storage equipment using NFS, iSCSI, or public cloud storage services.

2. PersistentVolumeClaim (PVC)

A PVC represents a request for storage by a Kubernetes user. Users define a PVC configuration and apply it to a pod, and Kubernetes then looks for an appropriate PV that can provide storage for that pod. When it finds one, the PV “binds” to the pod.

PVs and PVCs are analogous to nodes and pods. Just like a node is a computing resource, and a pod seeks a node to run on, a PersistentVolume is a storage resource, and a PersistentVolumeClaim seeks a PV to bind to.

The PVC is a complex mechanism that is the cause of many Kubernetes issues, some of which can be difficult to diagnose and resolve. In this post, let’s see the most common issues and basic strategies for troubleshooting them.

Persistent Volume and Claim Lifecycle

PVs and PVCs follow a lifecycle that includes the following stages:

  1. Provisioning: a PV can be provisioned either manually, via an administrator, or dynamically, based on pre-configured PVCs.
  2. Binding: when a user applies a PVC to a pod, Kubernetes searches for a PV with the required amount of storage and other requested storage criteria, and binds it exclusively to the pod.
  3. Using: at this stage, the bound PV is reserved for a specific pod.
  4. Storage Object in Use Protection: this is a feature that protects data when PVCs bind to PVs, to avoid data loss when a PVC is removed.
  5. Reclaiming: when users do not need a PV anymore, they can delete the PVC object. Once the claim has been released, the cluster uses it’s reclaim policy to determine what to do with the PV—retain, recycle, or delete it.
  6. Retain: this status enables PVs to be manually reclaimed. The PV continues existing without binding to any PVC. However, because it still includes data belonging to the previous user, it needs to be manually configured and cleaned before reuse.
  7. Delete: this status enables the cluster to remove the PV object, and disassociate from storage resources in the external infrastructure. This is the default for dynamically provisioned PVs.

How to Create a PersistentVolumeClaim (PVC) and Bind to a PV

Let’s quickly check how PVs and PVCs work. It is based on the full PV tutorial in the Kubernetes documentation.

1. Setting Up a Node

To use this tutorial, set up a Kubernetes cluster with only one node. Ensure your kubectl command line can communicate with the control plane. On the node, create a directory as follows:
# sudo mkdir /mnt/data
Within the directory, create an index.html file.

2. Creating PersistentVolume

Let’s create a YAML file defining a PersistentVolume:

apiVersion: v1
kind: PersistentVolume
metadata:
  name: task-pv-volume
  labels:
    type: local
spec:
  storageClassName: manual
  capacity:
    storage: 10Gi
  accessModes:
   —ReadWriteOnce
  hostPath:
    path: "/mnt/data"

Run the following command to create the PersistentVolume on the node:

# kubectl apply -f https://k8s.io/examples/pods/storage/pv-volume.yaml

3. Creating PersistentVolumeClaim and Bind to PV

Now, let’s create a PersistentVolumeClaim that requests a PV with the following criteria, which match the PV we created earlier:

  • Storage volume of at least 3 GB
  • Enables read-write access

Let’s create a YAML file for the PVC:

apiVersion: v1
kind: PersistentVolumeClaim
metadata:
  name: task-pv-claim
spec:
  storageClassName: manual
  accessModes:
   —ReadWriteOnce
  resources:
    requests:
      storage: 3Gi

Run this command to apply the PVC:

# kubectl apply -f https://k8s.io/examples/pods/storage/pv-claim.yaml

As soon as you create the PVC, the Kubernetes control plane starts looking for an appropriate PV. When it finds one, it binds the PVC to the PV.

Run this command to see the status of the PV we created earlier:

# kubectl get pv task-pv-volume
The output should look like this, indicating binding was successful:

NAME            CAPACITY  ACCESSMODES  RECLAIMPOLICY  STATUS  CLAIM   ...
task-pv-volume  10Gi      RWO          Retain         Bound   default/task-pv-claim

4. Creating A Pod and Mounting the PVC

The final step is to create a pod that uses your PVC. Run a pod with a NGINX image, and specify the PVC we created earlier in the relevant part of the pod specification:

spec:
  volumes:
   —name: task-pv-storage
      persistentVolumeClaim:
        claimName: task-pv-claim
containers:
   —name: task-pv-container
    ...
      volumeMounts:
       —mountPath: "/usr/share/nginx/html"
          name: task-pv-storage

Bash into your pod, install curl and run the command curl http://localhost/. The output should show the content of the index.html file you created in step 1. This shows that the new pod was able to access the data in the PV via the PersistentVolumeClaim.

Kubernetes PVC Errors

The Kubernetes PVC is a complex mechanism, and can result in errors that are difficult to diagnose and resolve. In general, PVC errors are related to three broad categories:

  • Persistent Volume creation issue: Kubernetes had a problem creating the persistent volume or enabling access to it, even though the underlying storage resources exist.
  • Persistent Volume provisioning issue: Kubernetes could not create the required persistent volume because storage resources were unavailable.
  • Changes in specs: Kubernetes had a problem connecting a pod to the required Persistent Volume because of a configuration change in the PV or PVC.

All of these issues can happen at different stages of the PVC lifecycle. We’ll review a few common errors you might encounter:

  • FailedAttachVolume
  • FailedMount
  • CrashLoopBackOff caused by PersistentVolume Claim

FailedAttachVolume and FailedMount Errors

FailedAttachVolume and FailedMount are two errors that indicate a pod had a problem mounting a PV. There is a difference between these two errors:

  • FailedAttachVolume: occurs when a volume cannot be detached from a previous node to be mounted on the current one.
  • FailedMount: occurs when a volume cannot be mounted on the required path. If the FailedAttachVolume error occurred, FailedMount will also occur as a result. But it is also possible that the volume is available, but there was a specific issue mounting on the path required.

Common causes

CausePossible Errors
Failure on the new nodeFailedMount
Incorrect access mode defined on the new nodeFailedMount
New node has too many disks attachedFailedMount
New node does not have enough mount pointsFailedMount
Network partitioning errorFailedMount
Incorrect path specifiedFailedMount
Service API call failureFailedAttachVolume, FailedMount
Failure of storage infrastructure on previous nodeFailedAttachVolume, FailedMount

Diagnosing the Problem

To diagnose why the FailedAttachVolume and FailedMount issues occurred, run the command:

# kubectl describe pod [name]

In the output, look at the Events section. Look for a message indicating one of the errors and the cause.

Events:
Type    Reason              Age   From     Message
----    ------              ----  ----     -------
Warning FailedAttachVolume  5m   kubelet  FailedAttachVolume Multi-Attach error for volume "pvc-xxxxxxxxxxxx" Volume is already exclusively attached to one node and can’t be attached to another
Warning FailedMount         5m   kubelet  Unable to mount volumes for pod "sample-pod":  timeout expired waiting for volumes to attach/mount for pod "sample-pod".

Resolving the Problem

Since Kubernetes can’t automatically handle the FailedAttachVolume and FailedMount errors on its own, sometimes you have to take manual steps.

If the problem is Failure to Detach:
Use the storage provider’s interface to detach the volume manually. For example, in AWS you can use the following CLI command to detach a volume from a node:

# aws ec2 detach-volume --volume-id [persistent-volume-id] --force

If the problem is Failure to Attach or Mount:
The easiest fix is a problem in the mount configuration. Check for a wrong network path or network partitioning issue that is preventing the PV from mounting.

Next, try to force Kubernetes to run the pod on another node. The PV may be able to mount there. Here are a few options for moving the pod:

  • Mark a node as unschedulable via the kubectl cordon command.
  • Run kubectl delete pod. This will usually cause Kubernetes to run the pod on another node.
  • Use node selectors, affinity, or taints, to specify that the pod should schedule on another node.

If you do not have other available nodes, or you tried the above and the problem recurs, try to resolve the problem on the node:

  • Reduce the number of disk partitions or add mount points
  • Check access mode on the new node
  • Identify and resolve issues with underlying storage

CrashLoopBackOff Errors Caused by PersistentVolumeClaim

The CrashLoopBackOff error means that a pod repeatedly crashes, restarts, and crashes again. This error can happen for a variety of reasons: see our guide to CrashLoopBackOff. However, it can also happen due to a corrupted PersistentVolumeClaim.

Diagnosing the Problem

To identify if CrashLoopBackOff is caused by a PVC, do the following:

  • Check logs from the previous container instance
  • Check Deployment logs
  • Failing the above, bash into the container and identify the issue

Resolving the Problem

If CrashLoopBackOff is due to an issue with a PVC, try the following:

  1. Scale the failed deployment to 0 using the following command. This ensures no other entities on the cluster are writing to the PV during your maintenance.
    # kubectl scale deployment [deployment-name] --replicas=0 
  2. Get the Deployment configuration to identify which PVC it uses:
    # kubectl get deployment -o jsonpath=" .spec.template.spec.volumes[*].persistentVolumeClaim.claimName}" failed-deployment
  3. The output of the previous command will be the identifier for the failed PVC. Use a debugging tool like Busybox to create a debugging pod that mounts the same PVC.
  4. Create a new debugging pod and run a shell using this command:
    # kubectl exec -it volume-debugger sh
  5. Identify which volume is currently mounted in the /data directory and resolve the issue.
  6. Exit the shell and delete the debugger pod.
  7. Scale the deployment up again using this command (setting the replicas argument to the required number of replicas).
    # kubectl scale deployment failed-deployment --replicas=1
RELATED ARTICLES
- Advertisment -

Most Popular

Recent Comments