Autoscaling Docker hosts on Azure with Virtual Machine Scale Sets & Rancher


A while back Mark Russinovich announced the public preview of the “Virtual Machine Scale Sets“;

VM Scale Sets are an Azure Compute resource you can use to deploy and manage a collection of virtual machines as a set. Scale sets are well suited for building large-scale services targeting big compute, big data, and containerized workloads – all of which are increasing in significance as cloud computing continues to evolve. Scale set VMs are configured identically, you just choose how many you need, which enables them to scale out and in rapidly and automatically.


So here we have a cloud service that would enable us to autoscale our hosts in terms of the load of the underlying systems. Now imaging combining this feature with Docker… I don’t know about your, but I’m excited about this premise! When combining this with Rancher, you could make your own Containers-as-a-Service (CaaS)! Today we’ll be delving into the matter to see how to implement this…


The Design

A quick extract from the ARM Resource Visualizer… when loading the ARM Template I have prepared for this deep dive.

2016-03-04 14_39_53-Azure Resource Visualizer

So what will we be deploying?

  • A virtual network
  • A loadbalancer with a public IP
  • A virtual machine scale set with auto scaling
  • Dynamic NAT rules for access to the autoscaled systems


ARM Template

In regards to the template, let’s start with the following template ;

Next up is to enhance this template with some additional extensions ;

                    "extensionProfile": {
                      "extensions": [
                          "name": "LinuxDiagnostic",
                          "properties": {
                            "publisher": "Microsoft.OSTCExtensions",
                            "type": "LinuxDiagnostic",
                            "typeHandlerVersion": "2.1",
                            "autoUpgradeMinorVersion": false,
                            "settings": {
                              "xmlCfg": "[base64(concat(variables('wadcfgxstart'),variables('wadmetricsresourceid'),variables('wadcfgxend')))]",
                              "storageAccount": "[variables('diagnosticsStorageAccountName')]"
                            "protectedSettings": {
                              "storageAccountName": "[variables('diagnosticsStorageAccountName')]",
                              "storageAccountKey": "[listkeys(variables('accountid'), variables('apiVersion')).key1]",
                              "storageAccountEndPoint": ""
                          "name": "Docker",
                          "properties": {
                            "publisher": "Microsoft.Azure.Extensions",
                            "type": "DockerExtension",
                            "typeHandlerVersion": "1.0",
                            "autoUpgradeMinorVersion": true,
                            "settings": {
                              "docker": {
                                "port": "2375"
                          "name": "RancherAgent",
                          "properties": {
                            "publisher": "Microsoft.OSTCExtensions",
                            "type": "CustomScriptForLinux",
                            "typeHandlerVersion": "1.4",
                            "autoUpgradeMinorVersion": false,
                            "settings": {
                              "fileUris": [
                              "commandToExecute": "[concat('./ ', parameters('rancherApi'))]"

What do we see here? Three extensions… The first is the diagonistic extension, that is used by the autoscaling mechanism. The next is the Docker extension that will properly install Docker. Normally I would add a Docker compose to that list… Though due to technical limitations I’ve added an additional custom extension that will deploy the rancher agent. And, offcourse, afterwards I used this to deploy my Rancher nodes.


How does this look in Azure?

The deployment succeeded…

2016-03-04 16_28_22-azuredeploy-0304-1034 - Microsoft Azure

And our resources are all there…

2016-03-04 16_28_37-azuredeploy-0304-1034 - Microsoft Azure

Here you can see a virtual machine scale set (kvaesvmss). Several storage accounts on which the nodes from the virtual machine scale set will be stored (…kvaesvmsssa). A virtual network in which they will be deployed and a loadbalancer (with external IP) which is used as an external entry point.

So how what details do we have with the virtual machine scale set? Not much to be honest… We can see the size & the capacity. (Sidenote : Check an earlier post to see how we can change the capacity manually.)

2016-03-04 16_31_47-Settings - Microsoft Azure

And the load balancer is also prepared for our pool…

2016-03-04 16_34_51-Backend address pools - Microsoft Azure


And how does this look in Rancher?

The two host have joined our farm. And as a test I’ve deployed weavescope on them to verify if everything went fine…

2016-03-04 16_39_58-Rancher

And the network test between both hosts also went fine… Here I did a ping from the network agent of “kvaesvmss-1” to the one of “kvaesvmss-0”.

2016-03-04 16_42_10-Rancher


Manual scaling…

Next up is the manual scaling. We’re going to issue a request to manually (down)scale to one node.

2016-03-04 16_44_28-Parameters - Microsoft Azure

And the request was executed fine.

2016-03-04 16_45_06-Settings - Microsoft Azure

Which results in a node being unreachable to Rancher… Which is kinda normal, as it got deleted in Azure. 🙂

2016-03-04 16_45_50-Rancher

So let’s clean that one up… Sadly we have to do this manually, as there is no automatic cleanup in Rancher (for the time being).

2016-03-04 16_47_24-Rancher



For our auto scaling test bed, I’ll be using the “acs-logging-test“-containers from Ross Gardler. He demoed this in one of the road shows for the Azure Container Service and it is a nice setup to demo the power of Docker.

2016-03-05 23_28_10-Rancher

So I’ve created a separate stack for this test and deployed the “producer” (acs-logging-test-simulate”) and the “consumer” (acs-logging-test-analyze). The “producer” will push messages to the queue. The “consumer” will pick up the messages from the queue, analyze them and update the results in the table.

2016-03-05 23_28_41-Rancher

As we can see, the system is pretty idle after deploying these two containers.

2016-03-05 23_28_58-Rancher

So let’s crank up the noise!

2016-03-05 23_29_20-Rancher

By increasing the scale of the “producer” from 1 to 32 container instances.

2016-03-05 23_29_41-Rancher

We can immediately see that Rancher starts to deploy new container images to satisfy the container scale we requested.

2016-03-05 23_30_00-Rancher

And to stress the system a bit more…

2016-03-05 23_53_07-

Or maybe A LOT… 😀

2016-03-05 23_53_31-Rancher

And after a while we can notice that the CPU of our system has being to be saturated.

2016-03-06 08_53_24-Rancher

And then the auto scale kicks in…

2016-03-06 08_54_07-Rancher

And keeps kicking in…

2016-03-06 09_02_10-Program Manager


Autoscale parameters?

For this PoC, we used the following config in our ARM template ;

            "type": "Microsoft.Insights/autoscaleSettings",
            "apiVersion": "2015-04-01",
            "name": "autoscalewad",
            "location": "[parameters('resourceLocation')]",
            "dependsOn": [
                "[concat('Microsoft.Compute/virtualMachineScaleSets/', parameters('vmSSName'))]"
            "properties": {
                "name": "autoscalewad",
                "targetResourceUri": "[concat('/subscriptions/',subscription().subscriptionId, '/resourceGroups/',  resourceGroup().name, '/providers/Microsoft.Compute/virtualMachineScaleSets/', parameters('vmSSName'))]",
                "enabled": true,
                "profiles": [
                        "name": "Profile1",
                        "capacity": {
                            "minimum": "1",
                            "maximum": "10",
                            "default": "1"
                        "rules": [
                                "metricTrigger": {
                                    "metricName": "\\Processor\\PercentProcessorTime",
                                    "metricNamespace": "",
                                    "metricResourceUri": "[concat('/subscriptions/',subscription().subscriptionId, '/resourceGroups/',  resourceGroup().name, '/providers/Microsoft.Compute/virtualMachineScaleSets/', parameters('vmSSName'))]",
                                    "timeGrain": "PT1M",
                                    "statistic": "Average",
                                    "timeWindow": "PT5M",
                                    "timeAggregation": "Average",
                                    "operator": "GreaterThan",
                                    "threshold": 50.0
                                "scaleAction": {
                                    "direction": "Increase",
                                    "type": "ChangeCount",
                                    "value": "1",
                                    "cooldown": "PT1M"

Here we can notice that the script is based upon the average CPU usage (>50%). It will increase in steps of one hosts. The minimum of the scale set has been set to 1 and the maximum to 10.

Looking into the automatic decrease and advanced load distribution will be for another time/post. At this time we just did a proof-of-concept to see if the platform would scale up. The integration (“intelligence”) did not go beyond that point.



  • The “Virtual Machine Scale Sets” of Azure provide a very nice system that could enable you to create your own “Containers-as-a-Service” (CaaS) platform.
  • The scale-up went as planned. During this post we saw that new nodes were added, and that those nodes were used for freshly deployed containers.
  • The scale-down and advanced load distribution was out-of-scope for this test, though this is off course feasible!
  • Containers & “CaaS” are not that hard as one would think thanks to Azure & Rancher!

One thought on “Autoscaling Docker hosts on Azure with Virtual Machine Scale Sets & Rancher

  1. Thanks for the post it has been very helpful for me! I believe you need to include the creation of rancherApi variable. It also maybe helpful to mention that “extensionProfile” needs to be nested within properties section of the “Microsoft.Compute/virtualMachineScaleSets”.

Leave a Reply

Fill in your details below or click an icon to log in: Logo

You are commenting using your account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )

Connecting to %s

This site uses Akismet to reduce spam. Learn how your comment data is processed.