Tuesday, November 5, 2024
HomeKubernetesK8s TroubleshootingKubernetes CrashLoopBackOff - How to Troubleshoot

Kubernetes CrashLoopBackOff – How to Troubleshoot

We have seen about imagepullbackoff error on last article, now let’s take a look on another familiar error on Kubernetes. If you are working on Kubernetes, this could be on the annoying error, you may experience multiple times. The error is nothing but Kubernetes CrashLoopBackOff, it is one of the common errors in Kubernetes, indicating a pod constantly crashing in an endless loop and either unable to get start or fail.

In this post will see how we can identify the cause of the issue and why we are getting CrashLoopBackOff error, and also, we will cover how you can solve this.

Why does CrashLoopBackOff occurs?

The CrashLoopBackOff error can occur due to varies reasons, including:

  • Insufficient resources—lack of resources prevents the container from loading
  • Locked file/database/port—a resource already locked by another container
  • No proper reference/Configuration—reference to scripts or binaries that are not present on the container or any misconfiguration on underlying system such as read-only filesystem
  • Config loading/Setup error—a server cannot load the configuration file or initial setup like init-container failing
  • Connection issues—DNS or kube-DNS is not able to connect to a external services
  • Downstream service – One of the downstream services on which the application relies can’t be reached or the connection fails (database, backend, etc.)
  • Liveness probes– Liveness probes could have misconfigured or probe fails due to any reason.
  • Port already in use: Two or more containers are using the same port, which doesn’t work if they’re from the same Pod

How to Diagnosis CrashLoopBackOff

To troubleshoot any issues, the best way to identify the root cause is to start going through the list of potential causes and check one by one. Let’s say easy on first. Also, another basic requirement is having better understanding of the environment, like what is the configuration, what port it used, is there any mount point, what is the probe configured, etc.

Back Off Restarting Failed Container

For first point to troubleshoot to collect the issue details run kubectl describe pod [name]. Let say you have configured and it is failing due to some reason like Liveness probe failed and Back-off restarting failed container.

If you get the back-off restarting failed container message this means that you are dealing with a temporary resource overload, as a result of an activity spike. The solution is to adjust periodSeconds or timeoutSeconds to give the application a longer window of time to respond.

Check the logs

If the previous step not providing any details or cannot identify, the next step will be pulling more details explanation about what is happening, you can get this from failing pod.

For that run kubectl get pods to identify the pod that was exhibiting the CrashLoopBackOff error. You can run the following command to get the log of the pod:

kubectl logs PODNAME

Try to walkthrough the error, to identify why the pod is repeatedly crashing. This may have some more details from the application running inside the pod, with this you could see any configuration error or any readiness issue like that.

Check Deployment Logs

Run the following command to retrieve the kubectl deployment logs:

kubectl logs -f deploy/ -n

This may also provide clues about issues at the application level. For example, below you can see a log file that shows ./datacan’t be mounted, likely because it’s already in use and locked by a different container.

Resource limit

you may be experiencing CrashLoopBackOff errors due to insufficient memory resources. You can increase the memory limit by changing the “resources:limits” in the Container’s resource manifest.

Issue with image

If still there is a issue, another reason could be the docker image you are using may not working properly, you need to make sure when you run separately it is working fine. If that is working and failing with Kubernetes, you may need to go advance way to find what is happening, try following,

Step 1: Identify entrypoint and cmd

You will need to identify the entrypoint and cmd to gain access to the container for debugging. Do the following:

  1. Run docker pull [image-id] to pull the image.
  2. Run docker inspect [image-id] and locate the entrypoint and cmd for the container image.

Step 2: Change entrypoint

Because the container has crashed and cannot start, you’ll need to temporarily change the entrypoint in the container specification to tail -f /dev/null.

Spec:
     containers:
      -   command:
           - “tail”
           - “-f”
           - “/dev/null”

Step 3: Check for the cause

With the entrypoint changed, you should be able to use the default command line kubectl to execute into the issue container. Once you login the container, check all the possible options and validate all good, if you see any issue fix it.

Step 4: Check for missing packages or dependencies

When you logged in, check if any packages or dependencies are missing, preventing the application from starting. If there are packages or dependencies missing, provide the missing files to the application and see if this resolves the error.

Step 5: Check application configuration

Inspect your environment variables and verify if they’re correct.

If that isn’t the problem, then perhaps your configuration files are missing/reference is not correct, could cause the application to fail. You can download/refer the correct path to missing files or

If there are any configuration changes required, like the username and password of the database configuration file, those could resolve that.

If the issue was not with missing files or configuration, you’ll need to look for some of the less generic reasons for the issue. Below are a few examples of what these may look like.

Other reasons?

Issue with External Services (DNS Error)

Sometimes, the CrashLoopBackOff error is caused by an issue with one of the third-party services. Check the syslog and other container logs to see if this was caused by any of the issues we mentioned as causes of CrashLoopBackoff (e.g., locked or missing files). If not, then the problem could be with one of the third-party services.

To verify this, you’ll need to use a debugging container. A debug container works as a shell that can be used to login into the failing container. This works because both containers share a similar environment, so their behaviours are the same. Here is a link to one such shell you can use: ubuntu-network-troubleshooting.

Using the shell, log into your failing container and begin debugging as you normally would. Start with checking kube-dns configurations, since a lot of third-party issues, start with incorrect DNS settings.

Container Failure due to Port Conflict

Let’s take another example in which the container failed due to a port conflict. To identify the issue, you can pull the failed container by running docker logs [container id].

Doing this will let you identify the conflicting service. Using netstat -tupln, look for the corresponding container for that service and kill it with the kill command. Delete the kube-controller-manager pod and restart.

How to Prevent the CrashLoopBackOff Error

We cannot always fix the issue, as it may happen one another reason, but what we can best is implementing some important steps to prevent the issue or having some kind of run book to make sure we are good at what we are doing.

1. Check your configuration and variables

As configurations are very critical for any application, if there is any issue that could cause the application to fail. So always make sure the configuration files are placed correctly and have proper reference in the manifests. Also, the contents are proper and it has correct values in-place and required configuration. Also if there is any environment values, make sure those are referred correctly and correct updated values. For this you can use configuration tools to manage these or keep all in single point of source, so we can avoid this kind of issues.

2. Dependency or external service

If an application uses a external services and calls made to a service fail, then the service itself is the problem. Most of the errors are usually could due to an error with the SSL certificate or network issues, so make sure those are functioning correctly. You can log into the container and manually reach the endpoints using curl to check and check kube-DNS working properly, if there is any issue, even those dependency service couldn’t reach, if it is fails to resolve.

3. Check File(system)

As mentioned before, file locks are a common reason for the CrashLoopBackOff error. So, check your volume configured properly or claims are having required permission. For example, if your application expects to read and write, make sure the permission granted or configured.

Hope this is useful, with this you should able to identify the most of the possible cause. There could be beyond that, may related to your environment.

Watch our Troubleshooting demo on


You can follow us on social media, to get some regular updates

RELATED ARTICLES
- Advertisment -

Most Popular

Recent Comments