Kubernetes Pod Graceful Shutdown – How?

0
6413
Kubernetes Pod Graceful Shutdown

Recently we have seen how the pod creation lifecycle, if you read about this, you could guess correctly how to pod deletion works. Yes, you are correct it is reverse process of pod creation. Here it starts from the removing the endpoints first. Let’s take a look how to pod deletion works, and also let’s see how Kubernetes Pod to Gracefully Shutdown.

Terminating a pod

In any application lifecycle, there are various reason pods get terminated, as like that in Kubernetes also it happens, either by user providing Kubectl delete or any updates etc. Other hand it may get terminated due to resource issue. In this case, Kubernetes allows the containers running in the pod to get shutdown gracefully with some configuration. Before we see about the configuration, lets understand how the delete/termination operation follows.

Once the user provided the kubectl delete command, it will be passed to API server, from there endpoints will be remove from the endpoints object, as we seen while pod creation the endpoint is important to get update for serving any services.

In this operation readiness probe are ignored and it will be directly removing the endpoint from the control plane. This will trigger the events to kube-proxy, ingress controller, DNS, etc.

So, with this all those components updates their reference and stop serving traffic to the IP address, please be note, this may be quick operation but sometimes the component may busy by performing some other operations. Hence there will be some delay expected, so the reference won’t be get updated immediately.

In the same time, status of the pod in etcd changes the status to Terminating.

You can watch also in youtube:

Kubelet get notified from the polling and it assigns the operation to components as like pod creation. Here

  • Unmounting any volumes from the container to the Container Storage Interface (CSI).
  • Detaching the container from the network and releasing the IP address to the Container Network Interface (CNI).
  • Destroying the container to the Container Runtime Interface (CRI).

Follow image may explains the change getting performed.

Hope the image is clear and you could see the key difference between pod creation and deletion. While the pod creation we have seen Kubernetes waited for the update from Kubelet to report the IP details and then updated the endpoints. But when the pod gets terminate, it removes the endpoint and also update to Kubelet sametime.

How this could be an issue? well here is the catch, as we said sometime the components takes time to update the endpoints, in this case what if the pod gets deleted before the endpoints get propagated, yes, we will face downtime. But why?

As mentioned still the ingress or any high-level services are not got updated, still it forwards the traffic to the pod which is already removed. But we might think, it is Kubernetes responsibility to update the changes to across the cluster and should avoid such an issue.

But it is definitely not.

As Kubernetes uses endpoint object and advanced abstractions like Endpoint Slices, to distribute the endpoints, it doesn’t verify the changes up-to-date on the components.

Hmm, how we can avoid these scenarios, as this may cause the downtime and we cannot maintain the 100% application uptime. Only option to achieve this, pod should wait to get deleted before the endpoint updated. We guessed just by seeing the situation, but is that possible? Let’s check it.

terminationGracePeriodSeconds

For that we should understand some deep understand about what happens in containers when the delete given.

When the delete has been given to pod, it receives the SIGTERM signal. By default, Kubernetes will send the SIGTERM signal and waits for 30 seconds before force killing the process. So we can enable some option to wait for sometime and then perform the action, like.

  • Wait for sometimes before exiting.
  • Still process the traffic for some time, like 10-20secs.
  • Then close all the backend connections like database, WebSocket
  • Finally close the process.

Incase if you application expects more time (more then 30sec) to stop, then you can include or change terminationGracePeriodSeconds in your pod definition.

You can include a script to wait for some time and then exit. In this case, before the SIGTERM invoked, Kubernetes exposes a prestop hook in the pod. You can mention like below,

apiVersion: v1
kind: Pod
metadata:
  name: test-pod
spec:
  containers:
    - name: nginx
      image: nginx
      ports:
        - name: nginx
          containerPort: 80
      lifecycle:
        preStop:
          exec:
            command: ["sleep", "10"]

With this option, you could see the Kubelet wait for 30s and then progress the SIGTERM, but noted this again may not sufficient, as you application may still processing some old requests. How you avoid those? You can achieve this by adding “terminationGracePeriodSeconds” with this setting it will wait further and then terminate the container. The final manifest will be looks like this,

apiVersion: v1
kind: Pod
metadata:
  name: test-pod
spec:
  containers:
    - name: nginx
      image: nginx
      ports:
        - name: nginx
          containerPort: 80
      lifecycle:
        preStop:
          exec:
            command: ["sleep", "10"]
        terminationGracePeriodSeconds: 45

This setting should help the application to process all the requests and close the connections. This will avoid the forceful shutdown.

Command line

You can also change the default grace period when you manually delete a resource with kubectl delete command, adding the parameter --grace-period=SECONDS. For example:

# kubectl delete deployment test --grace-period=60

Rolling updates

One another reason pods get deleted while we are upgrading or deploying new version. Lets assume you are running a v1.1 version with 3 replicas, and now upgrading to v1.2, what happens?

  • Creates a Pod with the new container image.
  • Destroys an existing Pod.
  • Waits for the Pod to be ready.
  • Repeats until all the pods are moved to new version.

Cool, this make sure the new version deployed, but what about old pods, Kubernetes waits until all the pods get deleted? Answer is NO.

Its keep moves on, the old version pods will be terminated and removed gracefully. But sometime you may see there is 2x no of pods, as old ones still getting removed.

Terminating long-running tasks

Even we have taken all the precaution, still there will be some applications or websockets needs to serve for long or we cannot stop while if there is any very long operation running or requests being utilized. In that time, rolling update will be in risk. How we can overcome?

There are two options,

  1. You can increase the terminationGracePeriodSeconds to couple of hours.
  2. Or creating new deployment, instead updating existing one.

Option 1: When you to do so, the endpoint of the pod is unreachable meantime. Also note, you cannot use any monitoring tool also to track those pods, you should monitor manually. As all the monitoring tools collects the information from the endpoints, once it is removed, even monitoring tools will be follow the same.

Option 2: When you create the new deployment, your old one will be there still. So, all the long running process will be still running and completes. When you see the processes are completed you remove the old ones manually.

If you wish to delete them automatically, you can to set up an autoscaler that can scale your deployment to zero replicas when they run out of tasks. This is useful every time, so you can keep previous pods Running for longer than the grace period.

Another excellent example is WebSockets.

If you are serving real-time updates to your clients, you might not want to terminate the WebSockets every time there is a release. If you are frequently releasing during the day, that could lead to several interruptions to real-time feeds.

Creating a new Deployment for every release is a less obvious but better choice. Existing users can continue utilizing updates while the most recent Deployment serves the new users. As a user disconnects from old Pods, you can gradually decrease the replicas and retire old Deployments.

Hope this is useful, check the possibilities and pick whatever matches your environment need.


You can follow us on social media, to get some regular updates

Google search engine