Serverless On-Demand Scaling : Pushing the pedal when you need it…


A lot of workloads are driven by peak consumption. From my experience, there aren’t the amount of workloads that have a constant performance need are in the minority. Now here comes the interesting opportunity when leveraging serverless architectures… Here you only pay for your actual consumption. So if you tweak your architecture to leverage this, then you can get huge gains!

For today’s post, I’ll be using VMchooser once again as an example. A lot has changed since the last post on the anatomy of this application. Here is an updated drawing of the high level architecture ;

Underneath you can see the flow that’ll be used when doing a “Bulk Mapping” (aka “CSV Upload”). The webapp (“frontend”) will store the CSV as a blob on the storage account. Once a new blob arrives, a function will be triggered that will examine the CSV file and put every entry onto a queue. Once a message is published onto the queue, another function will start processing this message. By using this pattern, I’m transforming this job into parallel processing job where each entry is handled (about) simultaneously. The downside of this, is that there will be contention/competition for the back-end resources (being the data store). Luckily, CosmosDB can scale on the fly too… We can adapt the request units as needed; up or down! So let’s do a small PoC and see who this could work…


Bulk Mapping : Pushing the pedal to the metal when needed!

In all honesty, VMchooser has a modest amount of visitors (a few hundreds per week). So the resources needed to accommodate these users is very slim. Though the “Bulk Mapping” feature is very power-hungry due to the parallel processing pattern I’m using. Why am I using this? Basically to improve the delivery time by a lot! If I would do a typical sequential batch, then we’re quickly talking in minutes to get things done for a rather large list. Though when doing this in parallel, the delivery time can be about the same for a small or large list.

The flip side of the coin is that I needed to provision my CosmosDB for this peak load. Which worked quite nicely from a technical & lazy concept. Though it wasn’t ideal for my budget… As I know my usage pattern and able to “predict” when a load comes, I can scale up (and afterwards down) my CosmosDB to handle this.  By doing this, I can keep my budget lean & mean. 😉

You might wonder how I can “predict” this load? The trigger is pretty straightforward… Once the function gets triggered to process a CSV file, then I know a load is coming. So I’m going to use the number of entries in the CSV file as a guidance. By multiplying this value by the average request units it typically takes, I can now pro-actively scale my CosmosDB to handle this load.

In terms of scaling back, this is a kind of “cronjob” at the moment. This has its own disadvantages… Though at the moment it fulfils the requirements and characteristics needed. So until I’m hitting the limitations of this system, I’ll probably going to keep this simple scale-back mechanism. 😉


Demo Time!

Now let’s see how this works in reality! So we’re going to do a bulk mapping…

I’ve tweaked the sample CSV a bit and now I have a CSV with 259 VMs inside… Ready to be parsed!

Before I press “Upload…”, we’re going to check the scale of our CosmosDB ;

You’ll notice that it’s in a fixed tier at the moment and running at the bare minimum of 400 RUs.

Even the metrics show that it’s currently idle… Now let’s check the two functions responsible for the scaling. My “cronjob” (RescaleCosmosDB) runs every 5 minutes and resets the RU to the bare minimum needed (thus 400 RUs).

The other function is exposed as an API and only triggered when needed. Here we can see that the last runtime was a day ago and increased the scale to 5400 RU.


Now let’s push “Upload…”! We get a nice message that we need to wait a few minutes.

Though let’s see what is happening under the hood. Our first function, that is triggered by the arrival of the blob, will parse the CSV file. Here it will put all entries onto a queue (one message per entry) and it will also scale-up our CosmosDB.

Let’s take a look at our scaling function… Yes, it was hit and changed the RU to 10k.

As the messages are put onto the queue, another function will be triggered… Generating 259 parallel runtimes all hitting our CosmosDB at the back-end.

Let’s take a peak at our CosmosDB… and yes, it was scaled to 10k RU!

Where we can see that our functions are doing their magic processing the information from the CSV-file.

Once done, let’s go back to VMchooser and check the output.

As you might expect, even this is an API call backed by a function…

Which will display the results ;

And we can even download these, for further tweaking, if you want… 😉

Let’s browse back to our CosmosDB, and we’ll notice that it was scaled back to 400RU.

This as our “every-5-minute-cronjob”-function was triggered.

What about our CosmosDB metrics? We can see that we got a peak due to the workload (259 VMs), which triggered a few requests per entry towards our CosmosDB.

Though it seems that I was a bit too pessimistic with the actual usage it would cause.

So let’s say I still have room for optimization here! 😉


Closing Thoughts

As always, some closing thoughts to end this post…

  • The pricing model behind serverless based services is very scalable. This is something you can tweak to your advantage! Don’t be blind for this opportunity and incorporate it into your architecture.
  • Creating a scale-out/in orchestration isn’t easy. The way I’m doing it here has many pitfalls… If a scale-down happens right after the scale-up, then my job will suffer. In this case, my error handling will need to be able to cover this situation.
  • When going in parallel, do not underestimate the impact of error handling. Once you go in scale, assume contention about resources and think how to deal with this.
  • Monitoring will become your best friend! 😉 It will teach you about your typical operations, but also provide information towards things that impact your cost model.

Anyhow, I hope you enjoyed this post!


Appendix ; Code Samples

If you are interested in the code behind the scaling functions…

One thought on “Serverless On-Demand Scaling : Pushing the pedal when you need it…

Leave a Reply

Fill in your details below or click an icon to log in: Logo

You are commenting using your account. Log Out /  Change )

Twitter picture

You are commenting using your Twitter account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )

Connecting to %s

This site uses Akismet to reduce spam. Learn how your comment data is processed.