Introduction
A while back Mark Russinovich announced the public preview of the “Virtual Machine Scale Sets“;
VM Scale Sets are an Azure Compute resource you can use to deploy and manage a collection of virtual machines as a set. Scale sets are well suited for building large-scale services targeting big compute, big data, and containerized workloads – all of which are increasing in significance as cloud computing continues to evolve. Scale set VMs are configured identically, you just choose how many you need, which enables them to scale out and in rapidly and automatically.
So here we have a cloud service that would enable us to autoscale our hosts in terms of the load of the underlying systems. Now imaging combining this feature with Docker… I don’t know about your, but I’m excited about this premise! When combining this with Rancher, you could make your own Containers-as-a-Service (CaaS)! Today we’ll be delving into the matter to see how to implement this…
The Design
A quick extract from the ARM Resource Visualizer… when loading the ARM Template I have prepared for this deep dive.
So what will we be deploying?
- A virtual network
- A loadbalancer with a public IP
- A virtual machine scale set with auto scaling
- Dynamic NAT rules for access to the autoscaled systems
ARM Template
In regards to the template, let’s start with the following template ; https://github.com/Azure/azure-quickstart-templates/tree/master/201-vmss-ubuntu-autoscale
Next up is to enhance this template with some additional extensions ;
"extensionProfile": { "extensions": [ { "name": "LinuxDiagnostic", "properties": { "publisher": "Microsoft.OSTCExtensions", "type": "LinuxDiagnostic", "typeHandlerVersion": "2.1", "autoUpgradeMinorVersion": false, "settings": { "xmlCfg": "[base64(concat(variables('wadcfgxstart'),variables('wadmetricsresourceid'),variables('wadcfgxend')))]", "storageAccount": "[variables('diagnosticsStorageAccountName')]" }, "protectedSettings": { "storageAccountName": "[variables('diagnosticsStorageAccountName')]", "storageAccountKey": "[listkeys(variables('accountid'), variables('apiVersion')).key1]", "storageAccountEndPoint": "https://core.windows.net" } } }, { "name": "Docker", "properties": { "publisher": "Microsoft.Azure.Extensions", "type": "DockerExtension", "typeHandlerVersion": "1.0", "autoUpgradeMinorVersion": true, "settings": { "docker": { "port": "2375" } } } }, { "name": "RancherAgent", "properties": { "publisher": "Microsoft.OSTCExtensions", "type": "CustomScriptForLinux", "typeHandlerVersion": "1.4", "autoUpgradeMinorVersion": false, "settings": { "fileUris": [ "https://raw.githubusercontent.com/kvaes/docker-rancher-scripts/master/agent-with-local-ip/rancher.sh" ], "commandToExecute": "[concat('./rancher.sh ', parameters('rancherApi'))]" } } } ] } } } },
What do we see here? Three extensions… The first is the diagonistic extension, that is used by the autoscaling mechanism. The next is the Docker extension that will properly install Docker. Normally I would add a Docker compose to that list… Though due to technical limitations I’ve added an additional custom extension that will deploy the rancher agent. And, offcourse, afterwards I used this to deploy my Rancher nodes.
How does this look in Azure?
The deployment succeeded…
And our resources are all there…
Here you can see a virtual machine scale set (kvaesvmss). Several storage accounts on which the nodes from the virtual machine scale set will be stored (…kvaesvmsssa). A virtual network in which they will be deployed and a loadbalancer (with external IP) which is used as an external entry point.
So how what details do we have with the virtual machine scale set? Not much to be honest… We can see the size & the capacity. (Sidenote : Check an earlier post to see how we can change the capacity manually.)
And the load balancer is also prepared for our pool…
And how does this look in Rancher?
The two host have joined our farm. And as a test I’ve deployed weavescope on them to verify if everything went fine…
And the network test between both hosts also went fine… Here I did a ping from the network agent of “kvaesvmss-1” to the one of “kvaesvmss-0”.
Manual scaling…
Next up is the manual scaling. We’re going to issue a request to manually (down)scale to one node.
And the request was executed fine.
Which results in a node being unreachable to Rancher… Which is kinda normal, as it got deleted in Azure. 🙂
So let’s clean that one up… Sadly we have to do this manually, as there is no automatic cleanup in Rancher (for the time being).
Autoscaling
For our auto scaling test bed, I’ll be using the “acs-logging-test“-containers from Ross Gardler. He demoed this in one of the road shows for the Azure Container Service and it is a nice setup to demo the power of Docker.
So I’ve created a separate stack for this test and deployed the “producer” (acs-logging-test-simulate”) and the “consumer” (acs-logging-test-analyze). The “producer” will push messages to the queue. The “consumer” will pick up the messages from the queue, analyze them and update the results in the table.
As we can see, the system is pretty idle after deploying these two containers.
So let’s crank up the noise!
By increasing the scale of the “producer” from 1 to 32 container instances.
We can immediately see that Rancher starts to deploy new container images to satisfy the container scale we requested.
And to stress the system a bit more…
Or maybe A LOT… 😀
And after a while we can notice that the CPU of our system has being to be saturated.
And then the auto scale kicks in…
And keeps kicking in…
Autoscale parameters?
For this PoC, we used the following config in our ARM template ;
{ "type": "Microsoft.Insights/autoscaleSettings", "apiVersion": "2015-04-01", "name": "autoscalewad", "location": "[parameters('resourceLocation')]", "dependsOn": [ "[concat('Microsoft.Compute/virtualMachineScaleSets/', parameters('vmSSName'))]" ], "properties": { "name": "autoscalewad", "targetResourceUri": "[concat('/subscriptions/',subscription().subscriptionId, '/resourceGroups/', resourceGroup().name, '/providers/Microsoft.Compute/virtualMachineScaleSets/', parameters('vmSSName'))]", "enabled": true, "profiles": [ { "name": "Profile1", "capacity": { "minimum": "1", "maximum": "10", "default": "1" }, "rules": [ { "metricTrigger": { "metricName": "\\Processor\\PercentProcessorTime", "metricNamespace": "", "metricResourceUri": "[concat('/subscriptions/',subscription().subscriptionId, '/resourceGroups/', resourceGroup().name, '/providers/Microsoft.Compute/virtualMachineScaleSets/', parameters('vmSSName'))]", "timeGrain": "PT1M", "statistic": "Average", "timeWindow": "PT5M", "timeAggregation": "Average", "operator": "GreaterThan", "threshold": 50.0 }, "scaleAction": { "direction": "Increase", "type": "ChangeCount", "value": "1", "cooldown": "PT1M" } } ] } ] } } ] }
Here we can notice that the script is based upon the average CPU usage (>50%). It will increase in steps of one hosts. The minimum of the scale set has been set to 1 and the maximum to 10.
Looking into the automatic decrease and advanced load distribution will be for another time/post. At this time we just did a proof-of-concept to see if the platform would scale up. The integration (“intelligence”) did not go beyond that point.
TL;DR
- The “Virtual Machine Scale Sets” of Azure provide a very nice system that could enable you to create your own “Containers-as-a-Service” (CaaS) platform.
- The scale-up went as planned. During this post we saw that new nodes were added, and that those nodes were used for freshly deployed containers.
- The scale-down and advanced load distribution was out-of-scope for this test, though this is off course feasible!
- Containers & “CaaS” are not that hard as one would think thanks to Azure & Rancher!
Thanks for the post it has been very helpful for me! I believe you need to include the creation of rancherApi variable. It also maybe helpful to mention that “extensionProfile” needs to be nested within properties section of the “Microsoft.Compute/virtualMachineScaleSets”.