Remember the last time you went shopping for a shirt? Then you surely also recall the moment in time when you were looking to find the right shirt size for yourself…
You probably also noticed that sizes might differ a bit depending on the context. A woman’s size vs & men’s size is totally different. There are geographical differences… and some people just like to wear cloths that have more “free space”.
So is today’s post about buying cloths? Hell no… 😉 But it’s to point out that there are analogies between finding the right shirt for you, and finding the right Azure Virtual Machine. Today we’ll delve into the aspects that will guide you a given T-shirt size in Azure ; for instance, why choose an FS1 above an A1_V2, where they both have 1 core & 2GB of memory. Though there is a price difference of 10€ per month on them.
A core is not a core…
You are probably as tired as me on hearing the “compare apples to apples”-statement. Though it is truly important to note that there is more to comparing processing power than the number of cores. There are two main dimensions that might cause a different experience for your workload;
- CPU Family : The first part is probably the easy part. The physical CPU that provides the cores has a big impact on the performance. Just take a look at CPU benchmarks, and you’ll clear see a difference in terms of performance. That’s why Azure provides a value which allows you to benchmark Azure VMs between each other, which is called “ACU” (Azure Compute Units). But I’ll come back to that one in a bit.
- Overprovisioning Ratio : When I did sizings for On Premises virtualization farms in the past, I typically utilized an over-subscription/provisioning ratio of 1:4 (to reduce costs). This meant that ever physical core was supporting up to four virtual cores. In Azure, with one exception being the A0, there is no overprovisioning… Each physical cores is only supporting one virtual core.
So the “punch” a core needs, basically the processing power that comes with it… and the oversubscription/provisioning ratio has a huge impact on your price.
What about that ACU thingie? Basically this is your comparison guideline… Let’s take a look at the D_v2 machines. They rely on an E5-2673 v3 (Haswell) processor underneath. When checking the ACU for that machine family (D_v2), we notice that the published ACU is set between 210 & 250. Here you are guaranteed 210 ACU. Though through Intel’s Turbo technology, you might peak towards 250 ACU. So if your current On Premises machine has the same CPU, and there is no overprovisioning on your side, then you can compare the cores in an 1:1 manner. Though if you are overprovisioning at a ratio of 1:4. Then you are looking towards a profile of 52,5 & 250 ACU. Why those numbers? Worst case you’ll get 1/4th of the core due to the oversubscription. Though if there aren’t any noise neighbours on your On Premises setup, then you’ll also be able to boost to 250 ACU. So comparing the cores is still kinda a hard thing to do, no? 😉
Premium vs Standard Disks
Next up… Is your machine going to need premium disks or not? Azure premium storage delivers high-performance, low-latency disk support for virtual machines (VMs) with input/output (I/O)-intensive workloads. If the answer is yes, then this will limit the amount of VM families you can use. Our basic advice is to use premium disks for your business critical workloads (aka “Production”) and for everything that is latency sensitive (aka Databases). General purpose, or standard, storage can have situation where the latency might “spike”. In reality, this is no different than your On Premises situation. If you do not have an “All Flash Array” SAN, then you can compare your On Premises situation to “Standard”-storage.
When taking a look at disk performance, we should first start off with the difference between “Standard” & “Premium” storage ;
- Standard Storage : You get a maximum of 500 IOPS & 60MB/s throughput per data disk.
- Premium Storage : The IOPS you receive depends on the premium disks you chose & the virtual machine size.
Let’s illustrate what I meant for the premium storage… First off, let’s take a look at the disks ;
Depending on the disk you add, you can go from 25MB/s & 120 IOPs to 250MB/s & 7500 per disk… Now let’s take a look at the virtual machines, for instance, the GS-series ;
We see 5 t-shirt sizes for the GS-series. The “smallest” can have a maximum of 4 disks and the “largests” can go up to 64 disks. That means that the smallest could add 4 times a P50 (7500 IOPS & 250MB/s). That would render the total on 30.000 IOPS and 1GB/s throughput. Do note that there are caps on VM level too… So if we are totally pessimistic and go for the “uncached” stats, then we can see that IOPS limit is set on 5000 and the throughput on 125MB. So one P30 would already make us reach the cap for that size. Lesson here ; Be aware that there is a cap on VM level too, and that you aren’t just adding up disks in terms of performance.
Now on a side note, if you would take the G5 (not the GS5), you will not have the ability to deploy premium disks, but you will also not have these caps!
So for some very specific use case, you might be better off going for the G-series instead of the GS-series… Just so you got the complete picture. Not to confuse you even more! 😉
In the last pictures about the G & GS series, you notice that the maximum number of NICs & network bandwidth will also vary between t-shirt sizes. All machines will have the ablity to deploy two nics. So yes, you can choose a small machine and utilize it as a “Network Virtual Appliance” (Firewall, LoadBalancer, … whatever). Though do mind that there are network caps in place too. Yes, I now the “moderate, high, very high, …” kinda suck and that you are looking for numbers. The networking product team gets that feedback on a daily basis.. So let’s hope that one gets “fixed” very soon!
Rightsizing & Sizing… Say what?!?
I literally just did a quick image search to find a CPU graph and this is what I found ;
When we analyze the graph, we can already conclude two things ;
- The CPU has a maximum utilization of 55%. Okay given, these are averaged out a bit due to the granularity… So let’s even give it up to 60-65%, where I’m being very lenient.
- The system is barely used between 9PM and 3AM. That means that system has 6 hours of “idle time”.
The first part, about the max. CPU utilization is “Rightsizing”. Basically you have an overprovisioned system here. Let’s say that is a system with 4 cores & 8GB or RAM. Then our cheapest option would be to go for an A4_v2 machine, which is about 115€ per month. Now if we would do rightsizing on that machine, given into account the 55% utilization, we could say that a A2m_v2 might just suffice. That machine comes down at a monthly price of 78€. Which basically reduced our monthly cost by 33%… Now if at a given time we would need additional performance, we could then upgrade to a large VM. This as we do not need to overprovision our systems.
The second part is about “snoozing”. In Azure we pay per minute… So if we do not need the system between 9PM & 3AM, we can just “snooze it” (aka “Shut it down”). That would mean that we reduce the cost of our system by an additional 25%… The beauty of this, is that Azure is built for automation. So you can easily create automation jobs to do this.
Tool : VMchooser
Azure has a lot of VM families & t-shirt sizes within the families. I concur with you… It is sometimes far from easy when doing an exercise to match virtual machines. That’s why I’ve created a tool called “VMchooser” to assist you in this tedious task.
Here you can enter the specifications for each virtual machine and afterwards get the three cheapest matches ;
If you’ve got a rather large batch of virtual machines you want to compare, there is also an “CSV Import” tool which will do batch processing on your entire list! Anyhow, let me know what you think about it… 😉
For today’s post, the key takeaways ;
- A core is not a core. Compare in terms of processing power & over provisioning ratio.
- Be aware that choosing a VM can have an impact on your disk performance (caps).
- Network throughput & amount of nics differs per vm size.
- Always do Rightsizing & Snoozing when you move to the cloud! Really… no really!
- There is a tool who can help you find the right VM.
2 thoughts on “What Azure Virtual Machine size should I pick?!?”
Thanks for the great article.
I have a question, though.
When you say “… Let’s take a look at the D_v2 machines. They rely on an E5-2673 v3 (Haswell) processor underneath. ”
How do you know that?
How do you know what is the physical infra?
Check the following docs ; https://docs.microsoft.com/en-us/azure/virtual-machines/windows/sizes
The performance of a virtual machine is typically indicated in ACU to abstract it away from having to worry about the actual CPUs (though they are indicated in the docs) ; https://docs.microsoft.com/en-us/azure/virtual-machines/windows/acu