FoxuTech

What Happens When Kubernetes Pod CPU & Memory Run High

What Happens When Kubernetes Pod CPU & Memory Run High

Kubernetes, the container orchestration platform of choice for many modern applications, thrives on efficient resource management. But what happens when pods, the fundamental units of deployment, start demanding more CPU or memory than they should? Buckle up, because we’re about to dive into the consequences of resource-hungry pods in the Kubernetes world.

The Two Biggies: CPU Throttling and OOM Killer

When a pod’s CPU usage breaches its limits, Kubernetes steps in with CPU throttling. This essentially acts like a dimmer switch, slowing down the container’s processes to ensure other pods get their fair share of CPU cycles. Imagine your pod struggling to run a marathon while wearing heavy boots; that’s CPU throttling in action.

Memory issues present a different beast. If a pod tries to gobble up more memory than its limit, the OOM killer swings into action. This ruthless entity identifies the most memory-hungry container and brutally terminates it, freeing up resources for other pods. Think of it as the eviction notice your landlord serves when you throw one too many epic house parties.

Let’s understand bit more in-detail about CPU throttling and Memory OOM.

Kubernetes CPU throttling

As mentioned, CPU Throttling is a behavior’s where processes are slowed when they are about to reach some resource limits.

Similar to the memory case, these limits could be:

Think of the following analogy. We have a highway with some traffic where:

Throttling here is represented as a traffic jam: eventually, all processes will run, but everything will be slower.

CPU process in Kubernetes

CPU is handled in Kubernetes with shares. Each CPU core is divided into 1024 shares, then divided between all processes running by using the cgroups (control groups) feature of the Linux kernel.

If the CPU can handle all current processes, then no action is needed. If processes are using more than 100% of the CPU, then shares come into place. As any Linux Kernel, Kubernetes uses the CFS (Completely Fair Scheduler) mechanism, so the processes with more shares will get more CPU time.

Unlike memory, Kubernetes won’t kill Pods because of throttling.

CPU overcommitment

As we saw in the limits and requests article, it’s important to set limits or requests when we want to restrict the resource consumption of our processes. Nevertheless, beware of setting up total requests larger than the actual CPU size, as this means that every container should have a guaranteed amount of CPU.

Monitoring Kubernetes CPU throttling

You can check how close a process is to the Kubernetes limits using following promQL:

(sum by (namespace,pod,container)(rate(container_cpu_usage_seconds_total{container!=""}[5m])) / sum by (namespace,pod,container)(kube_pod_container_resource_limits{resource="cpu"})) > 0.8

In case we want to track the amount of throttling happening in our cluster, cadvisor provides container_cpu_cfs_throttled_periods_total and container_cpu_cfs_periods_total. With these two, you can easily calculate the % of throttling in all CPU periods.

Kubernetes OOM

Every container in a Pod needs memory to run. Kubernetes limits are set per container in either a Pod definition or a Deployment definition.

All modern Unix systems have a way to kill processes in case they need to reclaim memory. This will be marked as Error 137 or OOMKilled. This Exit Code 137 means that the process used more memory than the allowed amount and had to be terminated.

This is a feature present in Linux, where the kernel sets an oom_score value for the process running in the system. Additionally, it allows setting a value called oom_score_adj, which is used by Kubernetes to allow Quality of Service. It also features an OOM Killer, which will review the process and terminate those that are using more memory than they should.

Note that in Kubernetes, a process can reach any of these limits:

here’s a use case when the Pod’s memory usage is very high and assume the Pod is a part of a deployment.

Memory overcommitment

Limits can be higher than requests, so the sum of all limits can be higher than node capacity. This is called overcommit and it is very common. In practice, if all containers use more memory than requested, it can exhaust the memory in the node. This usually causes the death of some pods in order to free some memory.

Monitoring Kubernetes OOM

When using node exporter in Prometheus, there’s one metric called node_vmstat_oom_kill. It’s important to track when an OOM kill happens, but you might want to get ahead and have visibility of such an event before it happens.

Instead, you can check how close a process is to the Kubernetes limits:

(sum by (namespace,pod,container) (rate(container_cpu_usage_seconds_total{container!=""}[5m])) / sum by (namespace,pod,container) (kube_pod_container_resource_limits{resource="cpu"})) > 0.8

Consequences of Resource Overload:

Either high CPU or memory or both the usage in pods lead to a chain reaction of unpleasantness:

Diagnosing the Root Cause:

Taming the Resource Monsters:

So, how do we keep our pods from turning into resource-guzzling gremlins? Here are some strategies:

Proactive Strategies:

Beyond Efficiency:

By incorporating these deeper insights and proactive strategies, we can transform our Kubernetes clusters from reactive resource battlegrounds into proactive, efficient, and cost-effective ecosystems. Remember, the journey for optimal resource management is continuous, requiring constant monitoring, adaptation, and innovation.

Conclusion

High CPU and memory usage in Kubernetes pods is a recipe for disaster. By understanding the consequences and implementing proper resource management techniques, you can keep your cluster running smoothly and avoid the wrath of the OOM killer. Remember, in the Kubernetes ecosystem, a balanced resources are key to healthy and happy pods!

Exit mobile version