How to Fix Exit Code 137 | Kubernetes Memory Issues

0
8018
Exit Code 137

As we have seen some Kubernetes troubleshooting articles before, today will see another Kubernetes issue and will see how to troubleshoot it. Today will see what is Exit Code 137 error and what is cause and why this happens. 

Exit Code 137 errors happen when a container or pod was terminated due to high memory usage. Your container or Kubernetes pod will be stopped to prevent the excessive memory consumption from affecting your host’s reliability.

You should identify the processes that ended with exit code 137 and should investigate in detail. One way we can simply conclude the system may reached the limit we defined in the manifest, but only this cannot be reason always. However, there might also be a memory leak or sub-optimal programming inside your application that’s causing resources to be consumed excessively, there are lot of factor to check before we conclude anything. 

In this article, you’ll learn how to identify, and debug exit code 137 so your containers runs reliably. This will reduce your maintenance overhead and help stop inconsistencies caused by services stopping unexpectedly. Although some causes of exit code 137 can be highly specific to your environment, most problems can be solved with a simple troubleshooting sequence. Always note that, any problem leading to distruption is business loss. We should be too causes and vigilance on the issue and should fix it permanently. 

What Is Exit Code 137?

As we know, All the processes emit an exit code when they terminate/killed unexpectly. Exit codes provide a mechanism for informing the user, operating system, and other applications why the process stopped. Each code is a number between 0 and 255. The meaning of codes below 125 is application-dependent, while higher values have special meanings.

A 137 code is issued when a process is terminated externally because of its memory consumption. The operating system’s out of memory manager (OOM) intervenes to stop the program before it destabilizes the host.

Pods running in Kubernetes will show a status of OOMKilled when they encounter a 137 exit code. Although this looks like any other Kubernetes status, it’s caused by the operating system’s OOM killer terminating the pod’s process. You can check the pod and understand the error. 

# kubectl get pods

Causes of Container Memory Issues

Understanding the situations that lead to memory-related container terminations is the first step towards debugging exit code 137. Here are some of the most common issues that you might experience.

Container Memory Limit Exceeded

Kubernetes pods will be terminated when they try to use more memory than their configured limit allows. 

Fix: You might be able to resolve this situation by increasing the limit if your cluster has additional capacity available. Lets see in-detail below. 

Application Memory Leak

Poorly optimized code can create memory leaks. A memory leak occurs when an application uses memory but doesn’t release it when the operations complete. This causes the memory to gradually fill up and will eventually consume all the available capacity.

Fix: There is no direct solution for this issue, to fix this, you should understand you changes and should see where there is more memory consuming or what causing to utilize the memory. 

Increases in the Load

Sometimes we may experience this issue when there is unexpected load we receive on our tool. We could plan our system resource for some limited number of users. But due to some demand, if there are high number of users, memory will be get consumed more (if the application is memory intensive). 

Fix: You should have extensive monitoring to get alerted when there is high observed, or you can enable supported auto-scaling to fix this issue or check your architect and implement suitable solution. 

Requesting More Memory

Kubernetes pods configured with memory resource requests can use more memory than the cluster’s nodes have if limits aren’t also used. A request allows consumption overages because it’s only an indication of how much memory a pod will consume and doesn’t prevent the pod from consuming more memory if it’s available.

Running Too Many Containers Without Memory Limits

Running several containers without memory limits can create unpredictable Kubernetes behavior when the node’s memory capacity is reached. Containers without limits have a greater chance of being killed, even if a neighboring container caused the capacity breach.

Fixing Pods and Containers from Causing Memory Issues

Debugging container memory issues in Kubernetes — or any other orchestrator — can seem complex, but using the right tools and techniques helps make it less stressful. If you are following our articles, you may understand the use of requests and limit, which will be useful in this kind of scenarios. 

Setting Memory Limits

As we seen, Pods without memory limits increase the chance of OOM kills and exit code 137 errors. These pods are able to use more memory than the node capable to serve, which poses a stability risk. When memory consumption gets close to the physical limit, the Linux kernel OOM killer intervenes to stop processes that are using too much memory.

Making sure each of your pods includes a memory limit is a good first step towards preventing OOM kill issues. Here’s a sample pod manifest:

apiVersion: v1
kind: Pod
metadata:
name: pod-with-memory-limit
spec:
containers:
- name: container-with-memory-limit
image: nginx:latest
resources:
requests:
memory: "256Mi"
limits:
memory: "512Mi"

The requests field indicates the pod wants 256 Mi of memory. Kubernetes will use this information to influence scheduling decisions and will ensure that the pod is hosted by a node with at least 256 Mi of memory available. Requests help to reduce resource contention, ensuring your applications have the resources they need. It’s important to note, though, that they don’t prevent the pod from using more memory if it’s available on the node.

This sample pod also includes a memory limit of 512 Mi. If memory consumption goes above 512 Mi, the pod becomes a candidate for termination. If there’s too much memory pressure and Kubernetes needs to free up resources, the pod could be stopped. Setting limits on all of your pods helps prevent excessive memory consumption in one from affecting the others.

Otherway, you need to understand the issue

Investigating Application 

As now we have appropriate memory limits, you can start investigating why those limits are being reached. Start by analyzing traffic levels to identify anomalies as well as natural growth in your service. If memory use has grown in correlation with user activity, it could be time to scale your cluster with new nodes, or to add more memory to existing ones.

If your nodes have sufficient memory, you’ve set limits on all your pods, and service use has remained relatively steady, the problem is likely to be within your application. To figure out where, you need to look at the nature of your memory consumption issues: is usage suddenly spiking, or does it gradually increase over the course of the pod’s lifetime?

A memory usage graph that shows large peaks can point to poorly optimized functions in your application. Specific parts of your codebase could be allocating a lot of memory to handle demanding user requests. You can usually work out the culprit by reviewing pod logs (if need enable to debug or trace level) to determine which actions were taken around the time of the spike. It might be possible to refactor your code to use less memory, such as by explicitly freeing up variables and destroying objects after you’ve finished using them.

Memory graphs that show continual increases over time usually mean you’ve got a memory leak. These problems can be tricky to find, but reviewing application logs and running language-specific analysis tools can help you discover suspect code. Unchecked memory leaks will eventually fill all the available physical memory, forcing the OOM killer to stop processes so the capacity can be reclaimed.

Debugging Kubernetes problems manually is time-consuming and error prone. You have to inspect pod status codes and retrieve their logs using terminal commands, which can create delays in your incident response. Kubernetes also lacks a built-in way of alerting you when memory consumption’s growing. You might not know about spiraling resource usage until your pods start to terminate and knock parts of your service offline.

For this you may need to enable some extensive monitoring solution for your kubernetes cluster, that will save your time. Either you can go with opensource tools like Prometheus and Grafana or you can try with datadog, containerHQ, splunk like these services. Or some application monitoring solution also you can try. 

Conclusion

Exit code 137 means a container or pod is trying to use more memory than it’s allowed. The process gets terminated to prevent memory usage ballooning indefinitely, which could cause your host system to become unstable.

Excessive memory usage can occur due to natural growth in your application’s use, or as the result of a memory leak in your code. It’s important to set correct memory limits on your pods to guard against these issues; while reaching the limit will prompt termination with a 137-exit code, this mechanism is meant to protect you against worse problems that will occur if system memory is depleted entirely.

To understand what is happening in your environment, it is always important to enable some monitoring system, as we cannot track all the system status same time. Try some proven and suitable monitoring solution to track and alert your ecosystem. 

Google search engine