I use cookies in order to optimize my website and continually improve it. By continuing to use this site, you are agreeing to the use of cookies.
You can find an Opt-Out option and more details on the Privacy Page!

Horizontal Pod Autoscaling by memory

With Kubernetes 1.8 the new HorizontalPodAutoscaler (HPA) was introduced that allows other metrics to be defined then only CPU and custom metrics. Now it is possible to define also metrics like Requestcount in Ingress and Memory utilization. In this post I’ll show how to setup a HorizontalPodAutoscaler that scales up/down if the memory utilization increases or decreases a percentage limit.


The only requirement for HPA is Heapster. Heapster is used by the HPA to receive metric information like cpu and memory usage and to descide with this information to scale up, down or hold. The installation of heapster is simple and can be performed as described on the GitHub Page. For me I used the InfluxDB version with Grafana to also see what happens inside of the cluster. For the HPA to work successfully I had to adjust the following parameters in the heapster.yaml:

  1. I switched the “–source” command argument from https://kubernetes.default to
  2. I added a command argument called “–metric-resolution=30s”. For my first tests the HPA fails because the metrics could not be loaded. In a GitHub Issue I found this information and it seems to work better.

Setup a HorizontalPodAutoscaler

A horizontal pod autoscaler is defined as yaml file. The definition is very easy to understand:

apiVersion: autoscaling/v2beta1
kind: HorizontalPodAutoscaler
  name: resize-hpa
  namespace: resize
    apiVersion: apps/v1beta1
    kind: Deployment
    name: image-resizer
  minReplicas: 2
  maxReplicas: 10
  - type: Resource
      name: memory
      targetAverageUtilization: 60

In the first part I describe that the HorizontalPodAutoscaler has to run inside of the resize namespace (more about it later…). The important part is the information for the resource to be scaled. In this case we say a Deployment called image-resizer should be scaled by this scaler. Then we define the minReplicas and maxReplicas values. This defines how many pods of the deployment should exist. In this case we say a minimum number of 2 pods should exist and the scaler can scale the pods count to max 10 pods. In this range the HorizontalPodAutoscaler can scale as he likes. The last part is the information about the resource the HPA should monitor to make decisions about the count of pods. Here we say that the average utilization of memory should not overrun an average of 60%. This is all to define a Horizontal Pod Autoscaler. With kubectl apply -f <filename> we can apply the HPA to the cluster.

Image resizer

The application

To test the HPA a bit I wrote an application that receives calls to resize an image to 20% of it’s initial size. Therefore I wrote an endpoint that receives HTTP requests in multipart/form-data format with an image and resizes it to 20% and then returns the image as HTTP response. The application was written in Spring Boot:

public class ImageResizerController {

    private final ImageResizerService imageResizerService;

    public ImageResizerController(ImageResizerService imageResizerService) {
        this.imageResizerService = imageResizerService;

    public ResponseEntity<?> resize(
            @RequestParam("data")MultipartFile file) {

        try {
            BufferedImage resizedImage = this.imageResizerService.resize(file.getBytes());

            ByteArrayOutputStream byteArrayOutputStream = new ByteArrayOutputStream();
            ImageIO.write( resizedImage  , "jpg", byteArrayOutputStream);
            byte[] bytes = byteArrayOutputStream.toByteArray();

            return ResponseEntity.ok()
        } catch (IOException e) {
            return new ResponseEntity<>(HttpStatus.INTERNAL_SERVER_ERROR);


public class ImageResizerService {

    public BufferedImage resize(byte[] image) throws IOException {
        InputStream in = new ByteArrayInputStream(image);
        BufferedImage imageBuf = ImageIO.read(in);

        int targetHeight = (int) (imageBuf.getHeight() * 0.2);
        int targetWidth = (int) (imageBuf.getWidth() * 0.2);

        BufferedImage resizedImage = new BufferedImage(targetWidth, targetHeight, imageBuf.getType());
        Graphics2D g = resizedImage.createGraphics();
        g.drawImage(imageBuf, 0, 0, targetWidth, targetHeight, null);

        return resizedImage;


Additionally I added the actuator framework to get health checks to reschedule a server if it hangs. The deployment of the server is very easy with an ingress, deployment and service. To let the HPA descide if the average memory utilization is more than the 60% we need to specify a memory limit, here I used 538Mi. I also specified -Xmx and -Xms parameters for the JVM. The values are 5% smaller then the resource limits I specified for the containers in the deployment pod template (512m). Here is the deployment:

apiVersion: apps/v1beta2
kind: Deployment
  name: image-resizer
  namespace: resize
    app: image-resizer
  replicas: 2
      app: image-resizer
        app: image-resizer
      - name: image-resizer
        image: <docker-image>:latest
            memory: "538Mi"
        - name: JAVA_OPTS
          value: "-Xmx512m -Xms512m"
        - containerPort: 8080
            path: /health
            port: 8080
          initialDelaySeconds: 30
          timeoutSeconds: 5
          failureThreshold: 3
            path: /health
            port: 8080
          initialDelaySeconds: 25
          timeoutSeconds: 5
          failureThreshold: 5
      - name: gitlab-com

I also defined an ingress, here it is important to add the “ingress.kubernetes.io/proxy-body-size” annotation because otherwise the max file size limit is 1MB. The same I also added to the Spring Boot application.properties file to allow bigger files.

apiVersion: extensions/v1beta1
kind: Ingress
  name: ingress-image-resizer
  namespace: resize
    kubernetes.io/ingress.class: "nginx"
    ingress.kubernetes.io/proxy-body-size: 8m
  - host: <domain>
      - path: /
          serviceName: image-resizer
          servicePort: 8080

The infrastructur

To run the test I started a 4 Node Kubernetes Cluster with 2GB RAM for each node, 1 Master and 1 ETCD node. In front of the 4 nodes I deployed a DigitalOcean Load Balancer. The Load Tests are executed from 2 other droplets. These droplets were small servers with a Docker installation on it to run the tests as Docker container.


The test was written with Gatling and was configured to make from each Load Test node 250 Requests over 10 minutes in ramp mode (I think the ramp was not required). So with two nodes running the test we had a total of 500 Requests over 10 minutes to the Resize Server. The image has a size of 4.1MB.

class FileResizeSimulation extends Simulation {

  val httpConf = http
    .acceptEncodingHeader("gzip, deflate")
    .userAgentHeader("Mozilla/5.0 (Macintosh; Intel Mac OS X 10.8; rv:16.0) Gecko/20100101 Firefox/16.0")

  val scn = scenario("Upload file to be resized")
      .formUpload("data", "testimage.jpg"))

  setUp(scn.inject(rampUsers(250) over (10 minutes)).protocols(httpConf))

What happen if we use no HorizontalPodAutoscaler

First try was to run the deployment with 2 Replicas without the HorizontalPodAutoscaler. The result says that 93% of the requests failed and only 37 of the 500 Requests succeeded. The reason for this was that the Pods run out of service by the livenessProbes and when they came back they are killed again instantly.


What happend with HorizontalPodAutoscaler

After this I enabled the HPA and run the test again. This time only 4 requests failed and all other 496 tests were successful. The only bad thing were the count of requests that took more than 800ms.


The problem were that the period between the start of new pods were 3 minutes so in the time of the test only 2 new pods came up. The HPA therefore considers a parameter called horizontal-pod-autoscaler-upscale-delay that defines how long to wait before scale up a new pod. To change this we need to reconfigure our kube-controller-manager.service. There I defined that the period between new pods should be 1 minute:

/usr/bin/kube-controller-manager \
    --allocate-node-cidrs=false \
    --cloud-provider= \
    --cluster-cidr= \
    --cluster-signing-cert-file=/etc/kube-controller-manager/ca.pem \
    --cluster-signing-key-file=/etc/kube-controller-manager/ca-key.pem \
    --leader-elect=true \
    --master=http://localhost:8080 \
    --root-ca-file=/etc/kube-apiserver/kube-ca.pem \
    --service-account-private-key-file=/etc/kube-apiserver/kube-api-key.pem \
    --service-cluster-ip-range= \

You should also define a time that describes how long to wait before scale down a deployment if the average memory decreases.

With this new parameter I run again the test. The results are very good only 2 requests failed and only 24 requests took longer then 800 ms:


Here you can see on the left side the response time before I changed the upscale delay from 3 minutes to 1 minute and on the right side the response time after I changed the upscale delay to 1 minute:


What happened in Kubernetes

I watched the HorizontalPodAutoscaler to see what happens inside of the cluster when the Load Test runs. Here are the outputs generated:

$ kubectl describe hpa -n resize resize-hpa

Name:                                                     resize-hpa
Namespace:                                                resize
Labels:                                                   <none>
Annotations:                                              kubectl.kubernetes.io/last-applied-configuration={"apiVersion":"autoscaling/v2beta1","kind":"HorizontalPodAutoscaler
CreationTimestamp:                                        Wed, 01 Nov 2017 00:43:41 +0100
Reference:                                                Deployment/image-resizer
Metrics:                                                  ( current / target )
  resource memory on pods  (as a percentage of request):  70% (399076010666m) / 60%
Min replicas:                                             2
Max replicas:                                             10
  Type            Status  Reason              Message
  ----            ------  ------              -------
  AbleToScale     False   BackoffBoth         the time since the previous scale is still within both the downscale and upscale forbidden windows
  ScalingActive   True    ValidMetricFound    the HPA was able to succesfully calculate a replica count from memory resource utilization (percentage of request)
  ScalingLimited  False   DesiredWithinRange  the desired replica count is within the acceptible range
  Type    Reason             Age   From                       Message
  ----    ------             ----  ----                       -------
  Normal  SuccessfulRescale  2m    horizontal-pod-autoscaler  New size: 3; reason: memory resource utilization (percentage of request) above target
  Normal  SuccessfulRescale  1m    horizontal-pod-autoscaler  New size: 4; reason: memory resource utilization (percentage of request) above target

What we can see is first that currently there is an utilization of 70% percent of memory by the pods. Next we can see some Conditions that describe the current state of the HPA and what can be performed and what went wrong. Here we can see, that we created less than 3 minutes ago a new pod. So the HPA is not allowed to scale down currently (AbleToScale). Next we see, that HPA is able to calculate the metrics of memory utilization, therefore it accesses the heapster service and asks him for memory information about the pods. If the heapster connection fails the condition will fail. In my first tests this happend so I changed the metric-resolution to 30 seconds. That stabilized the HPA.

In the last section you can see what happend in the past with the HPA. Here you can see, that the HPA detected a memory utilization of more than 60% and added a new pod to the deployment.

Next you can see the running pods and that they start with a delay of 1 minute as specified in the kube-controller-manager service.

$ kubectl get pods
NAME                          READY     STATUS    RESTARTS   AGE
image-resizer-fdb6b7d-45hgs   1/1       Running   0          3m
image-resizer-fdb6b7d-7ltf9   1/1       Running   1          2m
image-resizer-fdb6b7d-94v2t   1/1       Running   1          11m
image-resizer-fdb6b7d-l8qs6   0/1       Running   0          25s
image-resizer-fdb6b7d-v8nzx   1/1       Running   0          11m

Björn Wenzel

Björn Wenzel

My name is Björn Wenzel. I’m a Platform Engineer working for Schenker with interests in Kubernetes, CI/CD, Spring and NodeJS.