We have seen about imagepullbackoff error on last article, now let’s take a look on another familiar error on Kubernetes. If you are working on Kubernetes, this could be on the annoying error, you may experience multiple times. The error is nothing but Kubernetes CrashLoopBackOff
, it is one of the common errors in Kubernetes, indicating a pod constantly crashing in an endless loop and either unable to get start or fail.
In this post will see how we can identify the cause of the issue and why we are getting CrashLoopBackOff
error, and also, we will cover how you can solve this.
Why does CrashLoopBackOff occurs?
The CrashLoopBackOff
error can occur due to varies reasons, including:
- Insufficient resources—lack of resources prevents the container from loading
- Locked file/database/port—a resource already locked by another container
- No proper reference/Configuration—reference to scripts or binaries that are not present on the container or any misconfiguration on underlying system such as read-only filesystem
- Config loading/Setup error—a server cannot load the configuration file or initial setup like init-container failing
- Connection issues—DNS or kube-DNS is not able to connect to a external services
- Downstream service – One of the downstream services on which the application relies can’t be reached or the connection fails (database, backend, etc.)
- Liveness probes– Liveness probes could have misconfigured or probe fails due to any reason.
- Port already in use: Two or more containers are using the same port, which doesn’t work if they’re from the same Pod
How to Diagnosis CrashLoopBackOff
To troubleshoot any issues, the best way to identify the root cause is to start going through the list of potential causes and check one by one. Let’s say easy on first. Also, another basic requirement is having better understanding of the environment, like what is the configuration, what port it used, is there any mount point, what is the probe configured, etc.
Back Off Restarting Failed Container
For first point to troubleshoot to collect the issue details run kubectl describe pod [name]
. Let say you have configured and it is failing due to some reason like Liveness probe failed
and Back-off restarting failed container.
If you get the back-off restarting failed container
message this means that you are dealing with a temporary resource overload, as a result of an activity spike. The solution is to adjust periodSeconds
or timeoutSeconds
to give the application a longer window of time to respond.
Check the logs
If the previous step not providing any details or cannot identify, the next step will be pulling more details explanation about what is happening, you can get this from failing pod.
For that run kubectl get pods
to identify the pod that was exhibiting the CrashLoopBackOff
error. You can run the following command to get the log of the pod:
kubectl logs PODNAME
Try to walkthrough the error, to identify why the pod is repeatedly crashing. This may have some more details from the application running inside the pod, with this you could see any configuration error or any readiness issue like that.
Check Deployment Logs
Run the following command to retrieve the kubectl deployment logs:
kubectl logs -f deploy/ -n
This may also provide clues about issues at the application level. For example, below you can see a log file that shows ./datacan’t be mounted, likely because it’s already in use and locked by a different container.
Resource limit
you may be experiencing CrashLoopBackOff errors due to insufficient memory resources. You can increase the memory limit by changing the “resources:limits” in the Container’s resource manifest.
Issue with image
If still there is a issue, another reason could be the docker image you are using may not working properly, you need to make sure when you run separately it is working fine. If that is working and failing with Kubernetes, you may need to go advance way to find what is happening, try following,
Step 1: Identify entrypoint and cmd
You will need to identify the entrypoint and cmd to gain access to the container for debugging. Do the following:
- Run
docker pull [image-id]
to pull the image. - Run
docker inspect [image-id]
and locate the entrypoint and cmd for the container image.
Step 2: Change entrypoint
Because the container has crashed and cannot start, you’ll need to temporarily change the entrypoint in the container specification to tail -f /dev/null
.
Spec:
containers:
- command:
- “tail”
- “-f”
- “/dev/null”
Step 3: Check for the cause
With the entrypoint changed, you should be able to use the default command line kubectl to execute into the issue container. Once you login the container, check all the possible options and validate all good, if you see any issue fix it.
Step 4: Check for missing packages or dependencies
When you logged in, check if any packages or dependencies are missing, preventing the application from starting. If there are packages or dependencies missing, provide the missing files to the application and see if this resolves the error.
Step 5: Check application configuration
Inspect your environment variables and verify if they’re correct.
If that isn’t the problem, then perhaps your configuration files are missing/reference is not correct, could cause the application to fail. You can download/refer the correct path to missing files or
If there are any configuration changes required, like the username and password of the database configuration file, those could resolve that.
If the issue was not with missing files or configuration, you’ll need to look for some of the less generic reasons for the issue. Below are a few examples of what these may look like.
Other reasons?
Issue with External Services (DNS Error)
Sometimes, the CrashLoopBackOff
error is caused by an issue with one of the third-party services. Check the syslog and other container logs to see if this was caused by any of the issues we mentioned as causes of CrashLoopBackoff (e.g., locked or missing files). If not, then the problem could be with one of the third-party services.
To verify this, you’ll need to use a debugging container. A debug container works as a shell that can be used to login into the failing container. This works because both containers share a similar environment, so their behaviours are the same. Here is a link to one such shell you can use: ubuntu-network-troubleshooting.
Using the shell, log into your failing container and begin debugging as you normally would. Start with checking kube-dns
configurations, since a lot of third-party issues, start with incorrect DNS settings.
Container Failure due to Port Conflict
Let’s take another example in which the container failed due to a port conflict. To identify the issue, you can pull the failed container by running docker logs [container id]
.
Doing this will let you identify the conflicting service. Using netstat -tupln
, look for the corresponding container for that service and kill it with the kill
command. Delete the kube-controller-manager pod
and restart.
How to Prevent the CrashLoopBackOff Error
We cannot always fix the issue, as it may happen one another reason, but what we can best is implementing some important steps to prevent the issue or having some kind of run book to make sure we are good at what we are doing.
1. Check your configuration and variables
As configurations are very critical for any application, if there is any issue that could cause the application to fail. So always make sure the configuration files are placed correctly and have proper reference in the manifests. Also, the contents are proper and it has correct values in-place and required configuration. Also if there is any environment values, make sure those are referred correctly and correct updated values. For this you can use configuration tools to manage these or keep all in single point of source, so we can avoid this kind of issues.
2. Dependency or external service
If an application uses a external services and calls made to a service fail, then the service itself is the problem. Most of the errors are usually could due to an error with the SSL certificate or network issues, so make sure those are functioning correctly. You can log into the container and manually reach the endpoints using curl
to check and check kube-DNS working properly, if there is any issue, even those dependency service couldn’t reach, if it is fails to resolve.
3. Check File(system)
As mentioned before, file locks are a common reason for the CrashLoopBackOff
error. So, check your volume configured properly or claims are having required permission. For example, if your application expects to read and write, make sure the permission granted or configured.
Hope this is useful, with this you should able to identify the most of the possible cause. There could be beyond that, may related to your environment.
Watch our Troubleshooting demo on
You can follow us on social media, to get some regular updates
- Facebook: https://www.facebook.com/foxutech/
- Instagram: https://www.instagram.com/foxutech/
- YouTube: Foxutech
- Twitter: https://twitter.com/foxutech
- Medium: FoxuTech – Medium