FoxuTech

Horizontal Pod Autoscaler(hpa) – Know Everything About it

Horizontal Pod Autoscaler

In our recent post, we have discussed about Kubernetes autoscaling and methods of autoscaling with benefits and best practices. In this article, we are going to learn about one of Autoscaling method called Horizontal Pod Autoscaler. This method has own advantage as like other methods, and this is widely adopted method also, as per our understanding. Let’s see in detail about this method. 

What Is Horizontal Pod Autoscaler (HPA)?

A Kubernetes cluster is made up of one or more virtual machines called nodes. In Kubernetes, a pod is the smallest resource in the hierarchy and your application containers are deployed as pods. A pod is a logical construct in Kubernetes and requires a node to run, and a node can have one or more pods running inside of it.

Horizontal Pod Autoscaler is a type of autoscaler that can increase or decrease the number of pods in a Deployment, ReplicationController, StatefulSet, or ReplicaSet, usually in response to CPU utilization patterns. This process represents horizontal scaling because it changes the number of instances, not the resources allocated to a given container.

How Does HPA Work and What Are Its Benefits?

By default, HPA scales workloads based on pod metrics like average CPU/memory utilization and average pod utilization. It is also possible to use externally provided or custom metrics. After the initial setup, it can operate automatically — you only need to define the minimum and the maximum number of replicas, as per your requirements or demand. 

The configured HPA controller is responsible for checking metrics and scaling replicas accordingly by adding or removing pods. This scaling occurs automatically, but you can sometimes account for predictable fluctuations in loading requirements. HPA works in a loop by checking, updating, and re-checking metrics.

In the first step of the HPA loop, the controller continuously tracks resource (like CPU, Memory and other custom) utilization via the metrics server. Next, HPA calculates the optimal number of replicas based on the resource requirements. Then, the autoscaler decides whether to scale the application up or down. In the last step of the loop, HPA implements the target number of replicas.

As HPA is a continuous monitoring process, so this loop repeats as soon as it finishes. The default interval for HPA checks is 30 seconds. Use the --horizontal-pod-autoscaler-sync-period controller manager flag to change the interval value.

The autoscaling/v1 API version of the HPA only supports the average CPU utilization metric. The autoscaling/v2 API version allows scaling according to memory usage, defining custom metrics, and using multiple metrics in a single HPA object.

What Is the Impact of HPA on Kubernetes Resource Costs?

Running multiple workloads on a server instance can be cost-effective but tracking your Kubernetes costs and identifying where you can save is challenging. Autoscaling lets you tightly configure scaling to reduce waste and minimize application running costs.

Application usage often changes over time, requiring more or fewer pod replicas. HPA scales your workloads automatically. It is useful for stateless and stateful applications. Combining HPA with cluster scaling helps reduce costs for workloads with frequent demand changes, decreasing the number of nodes alongside the pods.

Properly configured, the HPA controller can monitor pods to determine if the number of replicas needs changing. It compares the current value to the target value.

How to Use HPA Metrics

As discussed above, the Horizontal Pod Autoscaler (HPA) enables horizontal scaling of container workloads running in Kubernetes. In order for HPA to work, the Kubernetes cluster needs to have metrics enabled. See how to enable metrics in the Kubernetes metrics server tool.

Kubernetes HPA supports four kinds of metrics:

Resource Metric

Resource metrics refer to CPU and memory utilization of Kubernetes pods against the values provided in the limits and requests of the pod spec. These metrics are natively known to Kubernetes through the metrics server. The values are averaged together before comparing them with the target values. That is, if three replicas are running for your application, the utilization values will be averaged and compared against the CPU and memory requests defined in your deployment spec.

Object Metric

Object metrics describe the information available in a single Kubernetes resource. An example of this would be hits per second for an ingress object.

Pod Metric

Pod metrics (referred to as PodsMetricSource) references pod-based metric information at runtime and can be collected in Kubernetes. An example would be transactions processed per second in a pod. If there are multiple pods for a given PodsMetricSource, the values will be collected and averaged together before being compared against the target threshold values.

External Metrics

External metrics are metrics gathered from sources running outside the scope of a Kubernetes cluster. For example, metrics from Prometheus can be queried for the length of a queue in a cloud messaging service, or QPS from a load balancer running outside of the cluster.

Horizontal Pod Autoscaler API Versions

API version autoscaling/v1 is the stable and default version; this version of API only supports CPU utilization-based autoscaling.

autoscaling/v2 version of the API brings usage of multiple metrics, custom and external metrics support.

You can verify which API versions are supported on your cluster by querying the api-versions. This command lists, all the versions.

# kubectl api-versions | grep autoscaling
autoscaling/v1
autoscaling/v2
autoscaling/v2beta1
autoscaling/v2beta2

Requirements

Horizontal Pod Autoscaler (and also Vertical Pod Autoscaler) requires a Metrics Server installed in the Kubernetes cluster. Metric Server is a container resource metrics (such as memory and CPU usage) source that is scalable, can be configured for high availability, and is efficient on resource usage when operating. Metrics Server gather metrics -by default- every 15 seconds from Kubelets, this allows rapid autoscaling,

You can easily check if the metric server is installed or not by issuing the following command:

# kubectl top pods

The following message will be shown if the metrics server is not installed.

error: Metrics API not available

On the other hand, if the Metric Server is installed, you should get appropriate output with resource utilization.

Installation of Metrics Server

If you have already installed Metrics Server, you can skip this section.

Metrics Server offers two easy installation mechanisms; one is using kubectl that includes all the manifests.

# kubectl apply -f https://github.com/kubernetes-sigs/metrics-server/releases/latest/download/components.yaml

The second option is using the Helm chart, which is preferred. Helm values can be found here.

First, add the Metrics-Server Helm repository to your local repository list as follows.

# helm repo add metrics-server https://kubernetes-sigs.github.io/metrics-server/

Now you can install the Metrics Server via Helm.

# helm upgrade --install metrics-server metrics-server/metrics-server

If you have a self-signed certificate, you should add --set args={--kubelet-insecure-tls} to the command above.

Verifying the Installation

As the installation is finished and we allow some time for the Metrics Server to get ready, let’s try the command again.

# kubectl top pods -n argocd
NAME CPU(cores) MEMORY(bytes)
argocd-application-controller-0 4m 24Mi
argocd-applicationset-controller-596ddc6c7d-d7lgl 4m 29Mi
argocd-dex-server-78c894df5b-svc87 14m 28Mi
argocd-notifications-controller-6f65c4ccdb-5cpb8 3m 22Mi
argocd-redis-ha-haproxy-787f9b5689-rpn62 6m 71Mi
argocd-redis-ha-server-0 13m 20Mi
argocd-repo-server-75b7c59bfb-cqtbz 15m 26Mi
argocd-server-d86d7959d-sd98v 16m 31Mi

Also, we can see the resources of the nodes with a similar command.

# kubectl top nodes
NAME CPU(cores) CPU% MEMORY(bytes) MEMORY%
aks-agentpool-12792500-vmss000002 125m 3% 1233Mi 9%

You can also send queries directly to the Metric Server via kubectl.

# kubectl get --raw /apis/metrics.k8s.io/v1beta1/nodes | jq

We can also verify our pod’s metrics from the API.

# kubectl get --raw /apis/metrics.k8s.io/v1beta1/namespaces/argocd/pods/argocd-server-d86d7959d-sd98v | jq

This is either the Metric Server control loop that hasn’t run yet, is not running correctly, or resource requests are not set on the target pod spec.

How to Configure Horizontal Pod Autoscaling?

As an illustration of the horizontal pod autoscaling capabilities, this article will show you how to:

Create a Deployment

The following section shows how to create a Docker image for a small PHP app that performs a resource-intensive calculation.

1. Create the app directory:

# mkdir hpa-test && cd hpa-test

2. Create a Dockerfile in a text editor:

# vim Dockerfile

3. Put the following contents into the file:

FROM php:5-apache
COPY index.php /var/www/html/index.php
RUN chmod a+rx index.php

4. Create the index.php file:

# vim index.php
<?php
$x = 0.0001;
for ($i = 0; $i <= 1000000; $i++) {
$x += sqrt($x);
}
echo "OK!";
?>

5. Add the mathematical operation to the file to create a CPU load.

6. Build the Docker image:

# docker build -t hpa-test:v1 .

7. Tag the image. Use your Docker Hub account username:

# docker image tag hpa-test:v1 motoskia/hpa-test:v1

8. Push the image to Docker Hub by typing:

# docker image push motoskia/hpa-test:v1

9. Create a deployment YAML in a text editor:

# vim hpa-test.yaml

10. The YAML defines the deployment and the service that exposes it. The spec.template.spec.containers section specifies that the deployment uses the Docker image created in the previous steps. Furthermore, the resources sub-section contains resource limits and requests.

11. Create the deployment by using the kubectl apply command:

# kubectl apply -f hpa-test.yaml -n foxutech

12. Confirm that the objects are ready and running.

# kubectl get all -n foxutech

Create HPA

With the deployment up and running, proceed to create a HorizontalPodAutoscaler object. The sections below illustrate the two methods for creating HPAs.

Install the Horizontal Pod Autoscaler

We now have the sample application as part of our deployment, and the service is accessible on port 80. To scale our resources, we will use HPA to scale up when traffic increases and scale down the resources when traffic decreases.

Let’s create the HPA configuration file as shown below:

# cat hpa.yaml
apiVersion: autoscaling/v1
kind: HorizontalPodAutoscaler
metadata:
name: hpa-demo-deployment
spec:
scaleTargetRef:
apiVersion: apps/v1
kind: Deployment
name: hpa-demo-deployment
minReplicas: 1
maxReplicas: 10
targetCPUUtilizationPercentage: 50

Apply the changes:

# kubectl apply -f hpa.yaml -n foxutech
horizontalpodautoscaler.autoscaling/hpa-test created

Verify the HPA deployment:

# kubectl get hpa -n foxutech
NAME REFERENCE TARGETS MINPODS MAXPODS REPLICAS AGE
hpa-test Deployment/hpa-test 0%/50% 1 10 1 3m19s

The above output shows that the HPA maintains between 1 and 10 replicas of the pods controlled by the hpa-test. In the example shown above (see the column titled “TARGETS”), the target of 50% is the average CPU utilization that the HPA needs to maintain, whereas the target of 0% is the current usage.

You can modify the max and min on the Yaml. Even you can use CLI, but we recommend to follow the GitOps way to make sure all the changes tracked and we can avoid manual changes.

Increase the Load

Before increasing the load to see the HPA in action, use kubectl get to check the status of the controller.

# kubectl get hpa -n foxutech

The TARGETS column in the output shows that no load is generated. Consequently, the number of created pod replicas is still one.

Now increase the CPU load using the load generator. Execute the following command in a new terminal tab/window:

# kubectl -n foxutech run -i --tty load-generator --rm --image=busybox --restart=Never -- /bin/sh -c "while sleep 0.01; do wget -q -O- http://hpa-test; done"

The command uses the busybox image to generate load and repeatedly query the hpa-test deployment. The deployment performs mathematical operations and engages the CPU.

In the first terminal tab/window, type the following command to watch the status of the HPA:

# kubectl get hpa -w -n foxutech

After a short time, the TARGETS column shows that the CPU load exceeds the limit specified upon the HPA creation. Consequently, the HPA increases the number of replicas to match the current load.

# kubectl get hpa -w -n foxutech
NAME REFERENCE TARGETS MINPODS MAXPODS REPLICAS AGE
hpa-test Deployment/hpa-test 0%/50% 1 10 1 5m20s
hpa-test Deployment/hpa-test 158%/50% 1 10 1 5m32s
hpa-test Deployment/hpa-test 158%/50% 1 10 4 5m47s
hpa-test Deployment/hpa-test 129%/50% 1 10 4 6m2s
hpa-test Deployment/hpa-test 154%/50% 1 10 4 6m17s
hpa-test Deployment/hpa-test 124%/50% 1 10 4 6m32s

Stop Generating Load

To stop generating the CPU load, switch to the load generator terminal tab/window and press CTRL+C.

Type the following command to confirm that the number of replicas is back to one.

# kubectl get hpa -w -n foxutech

Autoscaling Based on Custom Metrics

The autoscaling/v2 API version allows you to create custom metrics to trigger the HPA. To switch to autoscaling/v2, convert the deployment YAML into the new format with the following command:

# vim hpav2.yaml

Open the new file and inspect its contents. For example, to create a memory-related HPA condition, type the following in the spec.metrics section:

apiVersion: autoscaling/v2
kind: HorizontalPodAutoscaler
metadata:
  name: hpa-testv2-memory
spec:
  minReplicas: 2
  maxReplicas: 10
  scaleTargetRef:
    apiVersion: apps/v1
    kind: Deployment
    name: hpa-test
  metrics:
  - type: Resource
    resource:
      name: memory
      target:
        type: averageValue
        averageValue: 50Mi

Kubernetes enables several custom metrics to be used for the HPA, including:

Autoscaling Based on Multiple Metrics

The spec.metrics field in autoscaling/v2 allows setting up multiple metrics to be monitored by a single HPA. To specify more than one metric in a YAML, list them one after another. For example, the following section defines both CPU and memory metrics:

apiVersion: autoscaling/v2
kind: HorizontalPodAutoscaler
metadata:
  name: hpa-testv2-multi
spec:
  minReplicas: 2
  maxReplicas: 10
  scaleTargetRef:
    apiVersion: apps/v1
    kind: Deployment
    name: hpa-test
  metrics:
  - type: Resource
    resource:
      name: cpu
      target:
        type: Utilization
        averageUtilization: 50
  - type: Resource
    resource:
      name: memory
      target:
        type: AverageValue
        averageValue: 50Mi

Limitations of Horizontal Pod Autoscaler

While HPA is most useful for autoscaling a stateless application, it can also work with stateful sets. However, HPA also has some limitations:

Kubernetes HPA Best Practices

When running production workloads with autoscaling enabled, there are a few best practices to keep in mind.

Troubleshooting Kubernetes HPA

Insufficient Time to Scale

A common challenge with HPA is that it takes time to scale up a workload by adding another pod. Loads can sometimes change sharply, and during the time it takes to scale up, the existing pod can reach 100% utilization, resulting in service degradation and failures.

For example, consider a pod that can handle 800 requests with under 80% CPU utilization, and HPA is configured to scale up when the 80% CPU threshold is reached. Let’s say it takes 10 seconds for the new pod to start up.

If loads increase by 100 requests per second, the pod will reach 100% utilization within 2 seconds, while it takes 8 more seconds for the second pod to start receiving requests.

Possible solutions

Brief Spikes in Load

When a workload experiences brief spikes in CPU utilization (or any other scaling metrics), you might expect that HPA will immediately spin up an additional pod. However, if the spikes are short enough, this will not happen.

To understand why, consider that:

For example, assume HPA is set to scale when CPU utilization exceeds 80%. If CPU utilization suddenly spikes to 90%, but this occurs for only 2 seconds out of a 30 second metric resolution window, and in the rest of the 30-second period utilization is 20%, the average utilization is:

(2 * 90% + 28 * 90%) / 30 = 27%

When HPA polls for the CPU utilization metric, it will observe a metric of 27%, which is not even close to the scaling threshold of 80%. This means HPA will not scale — even though in reality, the workload experienced high load.

Possible solutions

Scaling Delay Due to Application Readiness

It can often happen that HPA correctly issues a scaling request, but for various reasons, it takes time for the new container to be up and running. These reasons can include:

Possible solutions

Excessive Scaling

In some cases, HPA might scale an application so much that it could consume almost all the resources in the cluster. You could set up HPA in combination with Cluster Autoscaler to automatically add more nodes to the cluster. However, this might sometimes get out of hand.

Consider these scenarios:

In these, and many similar scenarios, it is better not to scale the application beyond a certain limit. However, HPA does not know this and will continue to scale the application even when this does not make business sense.

Possible solutions

The best solution is to limit the number of replicas that can be created by HPA. You can define this in the spec:maxReplicas field of the HPA configuration:

apiVersion: autoscaling/v2
kind: HorizontalPodAutoscaler
metadata:
  name: hpa-testv2-memory
spec:
  minReplicas: 2
  maxReplicas: 10

In this configuration, maxReplicas is set to 10. Calculate the maximum expected load of your application and ensure you set a realistic maximal scale, with some buffer for surprise peaks in traffic.

Exit mobile version