Microsoft Azure : Benchmark Tests – Storage – How do the different series relate to each other?

Azure currently has different “series” of machines. The A-series are seen as “general purpose” machines, where the D-series are targeted towards compute optimization. In the US, the G-series have even seen the light! Today I want to know what the effect of this is on storage performance… Typical IT organizations are worried by storage performance in the cloud as their ERP/BI implementation is “quite eager” to obtain the maximum storage performance.

So what will we be covering today?

  • A bit of theory concerning the differences
  • Test Environment Explained
  • Test Method Explained
  • Display of test results
  • Conclusion / analysis of the test results


A bit of theory concerning the differences

What does Microsoft say about their series ;

  • General purpose compute (A) – Basic tier : An economical option for development workloads, test servers, and other applications that don’t require load balancing, auto-scaling, or memory-intensive virtual machines.
  • General purpose compute (A) – Standard tier : Offers the most flexibility. Supports all virtual machine configurations and features.
  • Optimized compute (D) : 60% faster CPUs, more memory, and local SSD – D-seres virtual machines feature solid state drives (SSDs) and 60% faster processors than the A-series and are also available for web or worker roles in Azure Cloud Services. This series is ideal for applications that demand faster CPUs, better local disk performance, or higher memories.
  • Performance optimized compute (G) : unparalleled computational performance with latest CPUs, more memory, and more local SSD – G-series virtual machines feature latest Intel® Xeon® processor E5 v3 family, two times more memory and four times more Solid State Drive storage (SSDs) than the D-series. G-series will provide unparalleled computational performance, more memory and more local SSD storage than any current VM size in the public cloud making it very ideal for your most demanding applications.

Sidenote ; Azure has also released “DS” (“Premium Storage“). We won’t be looking into this area, as it is current still under preview.

Today we’ll be checking what we can get out of those machines via benchmarking. Be aware, that Microsoft is open towards the IOPS delivered by each machine. Be sure to check out the support article “Virtual Machine and Cloud Service Sizes for Azure“. Depending on the type of virtual machine, you can attach a maximum amount of disks. Per disk, you are granted a given number of IOPS. The amount of IOPS granted differs by “series”. An “A – Basic” will be granted 300 IOPS per disk. An “A – Standard”, “D” & “G” will be granted 500 IOPS per disk, where the “DS” will be granted 1600 IOPS per disk.


Test Environment Explained

We’ll be creating four machines today ;

  • TEST-BSC-A1 : A1 Basic (West Europe)2015-01-27 16_13_03-Virtual machines - Windows Azure
  • TEST-STD-A1 : A1 Standard (West Europe)
    2015-01-27 16_12_48-Virtual machines - Windows Azure
  • TEST-STD-D1 : D1 Standard (West Europe)
    2015-01-27 16_13_24-Virtual machines - Windows Azure
  • TEST-STD-G1 : G1 Standard (West US*)
    2015-01-28 08_48_56-Virtual machines - Windows Azure

Each machine will be installed with Ubuntu 14.04 with the Azure image of 23/01/2015. The system will then be foreseen with two benchmarking tools ;

These packages will be installed from the Azure Ubuntu Repositories by using the following method. First make sure to uncomment all “multiverse” repositories.

sudo vi /etc/apt/sources.list

Then do an update of the packages list and install both softwares

sudo apt-get update && sudo apt-get install bonnie++ iozone3


(Disclaimer : For the test with the G1, I created an additional disk, as the base OS disk was not large enough to fit the test file. Bonnie++ advises to create a test file that is twice the size of the memory. This to counter caching mechanisms. / Update : One error I made, was that I the host caching is disabled by default, so some results on the G1 are not aligned with the other tests. This is only relevant towards the Bonnie++ tests, not to the IOzone tests.)


Test Environment / Method Explained

Now we are ready to go… On each system the following commands were executed ;

bonnie++ -d /tmp > /tmp/bonnie.txt

iozone -R -l 5 -u 5 -r 4k -s 100m -F /tmp/f1 /tmp/f2 /tmp/f3 /tmp/f4 /tmp/f5 > /tmp/iozone_results.txt

iozone -R -l 5 -u 5 -r 4k -s 100m -F /mnt/f1 /mnt/f2 /mnt/f3 /mnt/f4 /mnt/f5 > /tmp/iozone_results-mnt.txt

So what are we basically going to do? A good description about what IOzone will do can be found in the article “I Feel the Need for Speed: Linux File System Throughput Performance, Part 1” of Linux Magazine. The highlights ;


IOzone is open-source and written in ANSI C. It is capable of single thread, multi-threaded, and multi-client testing. The basic idea behind IOzone is to break up a file of a given size into records. Records are written or read in some fashion until the file size is reached. Using this concept, IOzone has a number of tests that can be performed:

  • WriteThis is a fairly simple test that simulates writing to a new file. Because of the need to create new metadata for the file, many times the writing of a new file can be slower than rewriting to an existing file. The file is written using records of a specific length (either specified by the user or chosen automatically by IOzone) until the total file length has been reached.

  • Re-writeThis test is similar to the write test but measures the performance of writing to a file that already exists. Since the file already exists and the metadata is present, it is commonly expected for the re-write performance to be greater than the write performance. This particular test opens the file, puts the file pointer at the beginning of the file, and then writes to the open file descriptor using records of a specified length until the total file size is reached. Then it closes the file which updates the metadata./LI>

  • ReadThis test reads an existing file. It reads the entire file, one record at a time.

  • Re-readThis test reads a file that was recently read. This test is useful because operating systems and file systems will maintain parts of a recently read file in cache. Consequently, re-read performance should be better than read performance because of the cache effects. However, sometimes the cache effect can be mitigated by making the file much larger than the amount of memory in the system.

  • Random ReadThis test reads a file with the accesses being made to random locations within the file. The reads are done in record units until the total reads are the size of the file. The performance of this test is impacted by many factors including the OS cache(s), the number of disks and their configuration, disk seek latency, and disk cache among others.

  • Random WriteThe random write test measures the performance when writing a file with the accesses being made to random locations with the file. The file is opened to the total file size and then the data is written in record sizes to random locations within the file.

  • Backwards ReadThis is a unique file system test that reads a file backwards. There are several applications, notably, MSC Nastran, that read files backwards. There are some file systems and even OS’s that can detect this type of access pattern and enhance the performance of the access. In this test a file is opened and the file pointer is moved 1 record forward and then the file is read backward one record. Then the file pointer is moved 2 records backward in the file, and the process continues.

  • Record RewriteThis test measures the performance when writing and re-writing a particular spot with a file. The test is interesting because it can highlight “hot spot” capabilities within a file system and/or an OS. If the spot is small enough to fit into the various cache sizes; CPU data cache, TLB, OS cache, file system cache, etc., then the performance will be very good.

  • Strided ReadThis test reads a file in what is called a strided manner. For example, you could read data starting at a file offset of zero, for a length of 4 KB, then seek 200 KB forward, then read for 4 KB, then seek 200 KB, and so on. The constant pattern is important and the “distance” between the reads is called the stride (in this simple example it is 200 KB). This access pattern is used by many applications that are reading certain data structures. This test can highlight interesting issues in file systems and storage because the stride could cause the data to miss any striping in a RAID configuration, resulting in poor performance.

  • FwriteThis test measures the performance of writing a file using a library function “fwrite()”. It is a binary stream function (examine the man pages on your system to learn more). Equally important, the routine performs a buffered write operation. This buffer is in user space (i.e. not part of the system caches). This test is performed with a record length buffer being created in a user-space buffer and then written to the file. This is repeated until the entire file is created. This test is similar to the “write” test in that it creates a new file, possibly stressing the metadata performance.

  • FrewriteThis test is similar to the “rewrite” test but using the fwrite() library function. Ideally the performance should be better than “Fwrite” because it uses an existing file so the metadata performance is not stressed in this case.

  • FreadThis is a test that uses the fread() library function to read a file. It opens a file, and reads it in record lengths into a buffer that is in user space. This continues until the entire file is read.

  • FrereadThis test is similar to the “reread” test but uses the “fread()” library function. It reads a recently read file which may allow file system or OS cache buffers to be used, improving performance.

When taking a look at Bonnie++, check out this article by TextualityMy objective is to gain a proper insight towards the latencies with Bonnie++ and use IOzone for the actual thoughput.


Display of test results


2015-01-28 11_57_47-Bonnie.xlsx - Excel  2015-01-28 12_02_31-Bonnie.xlsx - Excel


2015-01-28 12_06_15-iometer.xlsx - Excel

2015-01-28 12_01_58-iometer.xlsx - Excel

Download Raw Results Files

Conclusion / analysis of the test results

So what have we learned today?

  • The latency of the A-series is significately higher than those of the D/G-series.
  • There is a performance difference between the “Basic” and “Standard” of the A-series.
  • Whilst the D-series outperform the A-series, the G-series put all of the others in the dark.
  • There is a performance answer to all loads… Just choose wisely!


Leave a Reply

Fill in your details below or click an icon to log in: Logo

You are commenting using your account. Log Out /  Change )

Google photo

You are commenting using your Google account. Log Out /  Change )

Twitter picture

You are commenting using your Twitter account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )

Connecting to %s

This site uses Akismet to reduce spam. Learn how your comment data is processed.