Vertical Pod Autoscaler(vpa) – Know Everything About it

0
2850
Kubernetes Vertical Pod Autoscaler

In our recent posts, we have seen about Kubernetes Autoscaler and Horizontal Pod Autoscaler. This article could help to understand about the autoscaling on Kubernetes. In this post, we are going to see another Autoscaling method called, Vertical Pod Autoscaler. As we seen already, this autoscaler works as like other autoscaling, which collects the metrics from the metric server and take required action. In this method, it helps to update the resource of the pods directly. Will see in details about the Vertical Pod Autoscaler.

What is Vertical Pod Autoscaler (VPA)

Vertical Pod Autoscaler (VPA) frees users from the necessity of setting up-to-date resource limits and requests for the containers in their pods. When configured, it will set the requests automatically based on usage and thus allow proper scheduling onto nodes so that appropriate resource amount is available for each pod. It will also maintain ratios between limits and requests that were specified in initial containers configuration.

It can both scale-down pods that are over-requesting resources, and also up-scale pods that are under-requesting resources based on their usage over time.

Autoscaling is configured with a Custom Resource Definition object called VerticalPodAutoscaler. It allows to specify which pods should be vertically autoscale as well as if/how the resource recommendations are applied.

Kubernetes VPA resource configuration types

With VPA, there are two different types of resource configurations that we can manage on each container of a pod:

  • Requests
  • Limits

What is a request?

Requests define the minimum number of resources that containers need. For example, an application can use more than 256MB of memory, but Kubernetes will guarantee a minimum of 256MB to the container if its request is 256MB of memory.

What are limits?

Limits define the maximum number of resources that a given container can consume. Your application might require at least 256MB of memory, but you might want to ensure that it doesn’t consume more than 512MB of memory, i.e., to limit its memory consumption to 512MB

Components of VPA

A VPA deployment has three main components: VPA Recommender, VPA Updater, and VPA Admission Controller. Let’s take a look at what each component does.

  1. The VPA Recommender

Monitors resource utilization and computes target values. Looks at the metric history, OOM events, and the VPA deployment spec and suggests fair requests. The limits are raised/lowered based on the limits-requests proportion defined.

  • The VPA Updater

Evicts those pods that need the new resource limits. Implements whatever the Recommender recommends if “updateMode: Auto” is defined.

  • The VPA Admission Controller:

Changes the CPU and memory settings (using a webhook) before a new pod starts whenever the VPA Updater evicts and restarts a pod.

Evicts a pod if it needs to change the pod’s resource requests when the Vertical Pod Autoscaler is set with an updateMode of “Auto.” Due to the design of Kubernetes, the only way to modify the resource requests of a running pod is to recreate the pod, hence downtime is expected for single replica deployments.

How does Kubernetes VPA work?

Now that we’ve defined the components of VPA, let’s explore how they work together in practice.

here we tried to explain the VPA operation flow,

Let’s walk through exactly what’s happening in the diagram:

  • The user configures VPA.
  • VPA Recommender reads the VPA configuration and the resource utilization metrics from the metric server.
  • VPA Recommender provides pod resource recommendations.
  • VPA Updater reads the pod resource recommendations.
  • VPA Updater initiates the pod termination.
  • The deployment realizes the pod was terminated and will recreate the pod to match its replica configuration.
  • When the pod is in the recreation process, the VPA Admission Controller gets the pod resource recommendation. Since Kubernetes does not support dynamically changing the resource limits of a running pod, VPA cannot update existing pods with new limits. It terminates pods that are using outdated limits. When the pod’s controller requests the replacement from the Kubernetes API service, the VPA Admission Controller injects the updated resource request and limit values into the new pod’s specification.
  • Finally, the VPA Admission Controller overwrites the recommendations to the pod. In our example, the VPA admission controller adds a “250m” CPU to the pod.

Note:

We can also run VPA in recommendation mode. In this mode, the VPA Recommender will update the status field of the workload’s Vertical Pod Autoscaler resource with its suggested values, but will not terminate pods or alter pod API requests.

Kubernetes VPA Auto-Update Mode

There are multiple valid options for updateMode in VPA. They are:

  • Off – VPA will only provide the recommendations, and it will not automatically change resource requirements.
  • Initial – VPA only assigns resource requests on pod creation and never changes them later.
  • Recreate – VPA assigns resource requests on pod creation time and updates them on existing pods by evicting and recreating them.
  • Auto mode – It recreates the pod based on the recommendation.
  • We increased the CPU metrics in the above demo and then manually applied the changes to scale the pod. We can do this automatically by using the updateMode: “Auto” parameter.

Vertical pod autoscaling is not a default installation on the Kubernetes, we need to install it based on our need, here is the steps explain about how to install.

Installation

This doc is for installing latest VPA. For instructions on migration from older versions see Migration Doc.

Prerequisites

  • kubectl should be connected to the cluster you want to install VPA.
  • The metrics server must be deployed in your cluster. Read more about Metrics Server.
  • Make sure there is no additional VPA running on the cluster.

Install command

To install VPA, please download the source code of VPA (for example with git clone https://github.com/kubernetes/autoscaler) and run the following command inside the vertical-pod-autoscaler directory:

# ./hack/vpa-up.sh

Note: the script currently reads environment variables: $REGISTRY and $TAG. Make sure you leave them unset unless you want to use a non-default version of VPA.

Note: If you are seeing following error during this step:

  1. unknown option -addext
    1. please upgrade openssl to version 1.1.1 or higher (needs to support -addext option) or use ./hack/vpa-up.sh on the 0.8 release branch.

The script issues multiple kubectl commands to the cluster that insert the configuration and start all needed pods (see architecture) in the kube-system namespace. It also generates and uploads a secret (a CA cert) used by VPA Admission Controller when communicating with the API server.

To print YAML contents with all resources that would be understood by kubectl diff|apply|… commands, you can use

# .hack/vpa-process-yamls.sh print

The output of that command won’t include secret information generated by pkg/admission-controller/gencerts.sh script.

After installation the system is ready to recommend and set resource requests for your pods. In order to use it, you need to insert a Vertical Pod Autoscaler resource for each controller that you want to have automatically computed resource requirements. This will be most commonly a Deployment.

Test your installation

A simple way to check if Vertical Pod Autoscaler is fully operational in your cluster is to create a sample deployment and a corresponding VPA config:

# kubectl create -f examples/hamster.yaml

The above command creates a deployment with two pods, each running a single container that requests 100 millicores and tries to utilize slightly above 500 millicores. The command also creates a VPA config pointing at the deployment. VPA will observe the behaviour of the pods, and after about 5 minutes, they should get updated with a higher CPU request (note that VPA does not modify the template in the deployment, but the actual requests of the pods are updated). To see VPA config and current recommended resource requests run:

# kubectl describe vpa

Note: if your cluster has little free capacity these pods may be unable to schedule. You may need to add more nodes or adjust examples/hamster.yaml to use less CPU.

Example VPA configuration

apiVersion: autoscaling.k8s.io/v1
kind: VerticalPodAutoscaler
metadata:
  name: my-app-vpa
spec:
  targetRef:
    apiVersion: "apps/v1"
    kind:       Deployment
    name:       my-app
  updatePolicy:
    updateMode: "Auto"

Tear down

Note that if you stop running VPA in your cluster, the resource requests for the pods already modified by VPA will not change, but any new pods will get resources as defined in your controllers (i.e. deployment or replicaset) and not according to previous recommendations made by VPA.

To stop using Vertical Pod Autoscaling in your cluster:

  • If running on any cloud, clean up role bindings created in Prerequisites:

kubectl delete clusterrolebinding myname-cluster-admin-binding

  • Tear down VPA components:
# ./hack/vpa-down.sh

Limits control

When setting limits VPA will conform to resource policies. It will maintain limit to request ratio specified for all containers. VPA will try to cap recommendations between min and max of limit ranges. If limit range conflicts and VPA resource policy conflict, VPA will follow VPA policy (and set values outside the limit range).

Examples

Keeping limit proportional to request

The container template specifies resource request for 500 milli CPU and 1 GB of RAM. The template also specifies resource limit of 2 GB RAM. VPA recommendation is 1000 milli CPU and 2 GB of RAM. When VPA applies the recommendation, it will also set the memory limit to 4 GB.

Capping to Limit Range

The container template specifies resource request for 500 milli CPU and 1 GB of RAM. The template also specifies resource limit of 2 GB RAM. A limit range sets a maximum limit to 3 GB RAM per container. VPA recommendation is 1000 milli CPU and 2 GB of RAM. When VPA applies the recommendation, it will set the memory limit to 3 GB (to keep it within the allowed limit range) and the memory request to 1.5 GB (to maintain a 2:1 limit/request ratio from the template).

Resource Policy Overriding Limit Range

The container template specifies resource request for 500 milli CPU and 1 GB of RAM. The template also specifies a resource limit of 2 GB RAM. A limit range sets a maximum limit to 3 GB RAM per container. VPAs Container Resource Policy requires VPA to set containers request to at least 750 milli CPU and 2 GB RAM. VPA recommendation is 1000 milli CPU and 2 GB of RAM. When applying the recommendation, VPA will set RAM request to 2 GB (following the resource policy) and RAM limit to 4 GB (to maintain the 2:1 limit/request ratio from the template).

Starting multiple recommenders

It is possible to start one or more extra recommenders in order to use different percentile on different workload profiles. For example you could have 3 profiles: frugalstandard and performance which will use different TargetCPUPercentile (50, 90 and 95) to calculate their recommendations.

Please note the usage of the following arguments to override default names and percentiles:

  • –name=performance
  • –target-cpu-percentile=0.95

You can then choose which recommender to use by setting recommenders inside the VerticalPodAutoscaler spec.

Custom memory bump-up after OOMKill

After an OOMKill event was observed, VPA increases the memory recommendation based on the observed memory usage in the event according to this formula: 

recommendation = memory-usage-in-oomkill-event + max(oom-min-bump-up-bytes, memory-usage-in-oomkill-event * oom-bump-up-ratio).

You can configure the minimum bump-up as well as the multiplier by specifying startup arguments for the recommender: oom-bump-up-ratio specifies the memory bump up ratio when OOM occurred, default is 1.2. This means, memory will be increased by 20% after an OOMKill event. oom-min-bump-up-bytes specifies minimal increase of memory after observing OOM.

Defaults to 100 * 1024 * 1024 (=100MiB)

Usage in recommender deployment

  containers:
  - name: recommender
    args:
      - --oom-bump-up-ratio=2.0
      - --oom-min-bump-up-bytes=524288000

Using CPU management with static policy

If you are using the CPU management with static policy for some containers, you probably want the CPU recommendation to be an integer. A dedicated recommendation pre-processor can perform a round up on the CPU recommendation. Recommendation capping still applies after the round up.
To activate this feature, pass the flag –cpu-integer-post-processor-enabled when you start the recommender. The pre-processor only acts on containers having a specific configuration. This configuration consists in an annotation on your VPA object for each impacted container. The annotation format is the following:

vpa-post-processor.kubernetes.io/{containerName}_integerCPU=true

Troubleshooting

To diagnose problems with a VPA installation, perform the following steps:

  • Check if all system components are running:
# kubectl --namespace=kube-system get pods| grep vpa

The above command should list 3 pods (recommender, updater and admission-controller) all in state Running.

  • Check if the system components log any errors. For each of the pods returned by the previous command do:
# kubectl --namespace=kube-system logs [pod name] | grep -e '^E[0-9]\{4\}'
  • Check that the VPA Custom Resource Definition was created:
# kubectl get customresourcedefinition | grep verticalpodautoscalers

VPA Limitations

While VPA is a helpful tool for recommending and applying resource allocations, it has several limitations. Below are ten important points to keep in mind when working with VPA.

  • VPA is not aware of Kubernetes cluster infrastructure variables such as node size in terms of memory and CPU. Therefore, it doesn’t know whether a recommended pod size will fit your node. This means that the resource requests recommendation may be too large to fit any node, and therefore pods may go to a pending state because the resource request can’t be met. Some cloud providers such as GKE provide a cluster autoscaler to spin up more worker nodes addressing pod pending issues, but if the Kubernetes environment has no cluster autoscaler feature, then pods will remain pending, causing downtime.
  • VPA does not support StatefulSets yet. The problem is scaling pods in StatefulSet is not simple. Neither starting nor restarting can be done the way it’s done for a Deployment or ReplicaSet. Instead, the pods in StatefulSet are managed in a well-defined order. For example, a Postgres DB StatefulSet will first deploy the master pod and then deploy the slave or replication pods. The master pod can’t be simply replaced with just any other pod.
  • In Kubernetes, the pod spec is immutable. This means that the pod spec can’t be updated in place. To update or change the pod resource request, VPA needs to evict the pod and re-create it. This will disrupt your workload. As a result, running VPA in auto mode isn’t a viable option for many use cases. Instead, it is used for recommendations that can be applied manually during a maintenance window.
  • VPA won’t work with HPA using the same CPU and memory metrics because it would cause a race condition. Suppose HPA and VPA both use CPU and memory metrics for scaling decisions. HPA will try to scale out (horizontally) based on CPU and memory, while at the same time, VPA will try to scale the pods up (vertically). Therefore, if you need to use both HPA and VPA together, you must configure HPA to use a custom metric such as web requests.
  • VPA is not yet ready for JVM-based workloads. This shortcoming is due to its limited visibility into memory usage for Java virtual machine workloads,
  • The performance of VPA is untested on large-scale clusters. Therefore, performance issues may occur when using VPA at scale. This is another reason why it’s not recommended to use VPA within large production environments.
  • VPA doesn’t consider network and I/O. This is an important issue since ignoring I/O throughout (for writing to disk), and network bandwidth usage can cause application slow-downs and outages.
  • VPA uses limited historical dataVPA requires eight days of historical data storage before it’s initiated. The limited use of only eight days of data would miss monthly, quarterly, annual, and seasonal fluctuations that could cause bottlenecks during peak usage.
  • VPA requires configuration for each cluster. If you manage a dozen or more clusters, you would have to manage separate configurations for each cluster. More sophisticated optimization tools provide a governance workflow for approving and unifying configurations across multiple clusters.
  • VPA policies lack flexibility. VPA uses a resource policy to control resource computations, and an update policy to control how to apply changes to Pods. The policy functionality is however limited. For example, the resource policy sets a higher and a lower value calculated based on historical CPU and memory measurements aggregated into percentiles (e.g., 95 percentile) and you can’t choose a more sophisticated machine learning algorithm to predict usage.
  • Whenever VPA updates the pod resources, the pod is recreated, which causes all running containers to be recreated. The pod may be recreated on a different node.
  • The VPA admission controller is an admission webhook. If you add other admission webhooks to your cluster, it is important to analyze how they interact and whether they may conflict with each other. The order of admission controllers is defined by a flag on API server.
  • Multiple VPA resources matching the same pod have undefined behaviour.
Google search engine