Migrating MySQL data to Azure SQL with Azure Data Factory

Introduction

Earlier this week I migrated “storage.kvaes.be” towards Azure. It was long due… Though I was determined to change the backend to Azure Table Storage. Are the better setups for this? Yes there are! Though I wanted to get myself a bit more familiar with the table storage from PHP. So I thought it was a nice test. 🙂 Anyhow, for the actual data migration I used a combination of manual mutations & data factory. I’ve already used the Azure Data Factory a few times before, and it always pleases me.

That brings me to today’s post, where I’ll do a quick run through how you can use Azure Data Factory for the migration of your MySQL database towards an Azure SQL Database (or any other support target).

Continue reading “Migrating MySQL data to Azure SQL with Azure Data Factory”

Ever tried the mix of Azure, SQL Server, Storage Spaces Direct & Scale Out File Server?

Introduction

A while back I posted a blog post how to setup a High Available SQL cluster on Azure using SIOS Datakeeper. As I’m an avid believer of storage spaces, I was looking for a moment to test drive “storage spaces direct” on Azure. The blog post of today will cover that journey…

UPDATE (01/02/2017) ; At this point, there is no official support for this solution. So do not implement it for production at this point. As soon as this changes, I’ll update this post accordingly!

UPDATE (08/02/2017) ; New official documentation has been released. Though I cannot find official support statements.

UPDATE (30/03/2017) ; A few days after the previous update, the following post was made => Deploying IaaS VM Guest Clusters in Microsoft Azure

 

Solution Blueprint

What do we want to build today?

  • A two node cluster which will be used as a Failover Cluster Instance for MSSQL.
  • As a quorum, we’ll be using the cloud witness feature of Windows 2016 in combination with an Azure storage account.
  • In regards to storage, we’ll create a Scale Out File Server setup which will leverage the local disks of the two servers via Storage Spaces Direct.
  • To achieve a “floating IP”, we’ll be using the Azure LoadBalancer setup (as we did in the last post).

kvaes-sql-cluster-s2d-sofs-azure

 

Continue reading “Ever tried the mix of Azure, SQL Server, Storage Spaces Direct & Scale Out File Server?”

MSSQL : What should I know about licensing HA/DR scenario’s in the cloud?

Introduction

This week I met a customer who described the following situation to me…

2016-10-23-10_59_01-sql_server_2016_licensing_guide_en_us-pdf

Where, to be honest, I was a it flabbergasted as I could not imagine this being true. So I read up on the matter, to see where this could possibly come from.

Over the course of the last years, I’ve been in contact with licensing way too much. Though in the end, this can really make a lot of difference in euros/dollars/… And that is what matters to the business!

Update (31 Oct 2019) : https://cloudblogs.microsoft.com/sqlserver/2019/10/30/new-high-availability-and-disaster-recovery-benefits-for-sql-server/

 

Bibliography

All things I’m going to say during this post use the following posts as foundation ;

Continue reading “MSSQL : What should I know about licensing HA/DR scenario’s in the cloud?”

Behind the scenes : Creating a Microsoft SQL Server as a Windows / Docker Container

Introduction

This post is the first of a series in my journey to build a flexible / production ready MSSQL windows container. I thought this would have been a breeze with my experience on Docker for Linux, though I must admit running into multiple issues… This post will not provide you with a working container, as I’m still developing that one.

2016-07-06 14_23_15-kvaeschost01 - 104.45.23.120_3389 - Remote Desktop Connection

Once I deem it as production ready, it’ll be released to the community to be used freely. Though I want it to meet my personal quality standards, being that it should be stable and flexible enough to run in production mode.

 

Blueprint Braindump

For those who have been following me for a while (real life, twitter, yammer, linkedin, …); you probably know I’ve been preaching about MSSQL as a container for way too long. My personal vision was to have a MSSQL run in a container. The data should be located outside of the container, which would enable a (more/relative) easy path for the changes you want to implement.

kvaes-mssql-container-blueprint-docker-persistent-storage-identity-repository

So where volume mapping would be an option… I was also considering an integration with an external storage service. As an Azure fanatic, I (also) want to leverage the option of storing my data/temp files on Azure storage. This would provide my with total host independent storage persistence on Docker! For those who have been playing with Docker for a while, this is truly a powerful combination.

As a long term goal, I would like to see this running on a “serverless” platform. From what I have seen in the market, this is still an unreachable utopia/Walhalla at this point. So my current objective in that areas is to investigate the option of deploying this setup on a Service Fabric or to leverage the power of Rancher with Windows containers.

Continue reading “Behind the scenes : Creating a Microsoft SQL Server as a Windows / Docker Container”

Azure : Performance limits when using MSSQL datafiles directly on an Storage Account

Introduction

In a previous post I explained how you are able to integrate MSSQL with Azure storage by directly storing the data files on the storage account.

2016-04-22 19_41_15-kvaessql21 - 104.40.158.231_3389 - Remote Desktop Connection

Now this made me wondering what the performance limitations would be of this setup? After doing some research, the basic rule is that the same logic applies to “virtual disks”, as to the “data files”… Why is this? They are both “blobs” ; the virtual disk is a blob called “disk” and the data files will be “page blobs”.

2016-04-25 09_35_42-Pricing - Cloud Storage _ Microsoft Azure

Continue reading “Azure : Performance limits when using MSSQL datafiles directly on an Storage Account”

Azure : Setting up a high available SQL cluster with standard edition

Introduction

It is important to know that you will only get an SLA (99,95%) with Azure when you have two machines deployed (within one availability set) that do the same thing. If this is not the case, then Microsoft will not guarantee anything. Why is that? Because during service windows, a machine can go down. Those service windows are quite broad in terms of time where you will not be able to negotiate or know the exact downtime.

That being said… Setting up your own high available SQL database is not that easy. There are several options, though it basically bears down to the following ;

  • an AlwaysOn Availability Groups setup
  • a Failover Cluster backed by SIOS datakeeper

Where I really like AlwaysOn, there are two downsides to that approach ;

  • to really enjoy it, you need the enterprise edition (which isn’t exactly cheap)
  • not all applications support AlwaysOn with their implementations

So a lot of organisations were stranded in terms of SQL and moving to Azure. Though, thank god, a third party tool introduced itself ; SIOS Datakeeper ! Now we can build our traditional Failover Cluster on Azure.

 

Design

Before we start, let’s delve into the design for our setup ;

kvaes-azure-sql-cluster-sios-datakeeper-high-availability-ha

Continue reading “Azure : Setting up a high available SQL cluster with standard edition”

Azure : Enterprise Grade SQL Provisioning

Introduction

The mailinglist of Luper’s learning featured the following article ;

Introducing a simplified configuration experience for SQL Server in Azure Virtual Machines

To me, this is a very cool and shows the innovation that’s going on in the whole Azure landscape. Underneath, any azure enthousiast, will see that this is powered by the ARM JSON templates. Though from a business perspective, the right questions are posed!

  • Who should be able to access it and how?
  • What are the storage capacity & performance expectations?
  • What about (technical) maintenance aspects like backup & patching?

How to find the machine?

Go to the marketplace in the Azure portal and find the following SQL server ;

2016-01-08 08_27_27-Microsoft Edge

What will it bring to the table?

Continue reading “Azure : Enterprise Grade SQL Provisioning”

Azure : MSSQL – Impact of parameters on Backup Performance

A while ago we experienced an issue with slow backup performance on an Azure VM. In light of the experience we had on that case, I went into my lab environment to test the impact of different parameters…

What is the test environment we’ll be using?

  • Virtual Machine : D2
  • Storage : Z-drive, based upon storage spaces, with one disk (100GB / LRS / Standard)
  • Paths : SQL Binaries, Data & Log files on Z-drive. Backup destination used was the temporary drive (D).
  • Database : AdventureWorks increased several times up till 1,5GB (using a set of SQL commands by Jonathan Kehayias), compared to the 250MB.

 

Test Run A
Command

BACKUP DATABASE [AdventureWorks2012]
TO DISK = N’d:\Backup\Test.bak’ WITH NOFORMAT, NOINIT,
NAME = N’HC_TST-Full Database Backup’, SKIP, NOREWIND, NOUNLOAD, STATS = 10

Result

Processed 192888 pages for database ‘AdventureWorks2012’, file ‘AdventureWorks2012_Data’ on file 1.
Processed 405530 pages for database ‘AdventureWorks2012’, file ‘AdventureWorks2012_Log’ on file 1.
BACKUP DATABASE successfully processed 598418 pages in 211.441 seconds (22.110 MB/sec).

Comment

We started out using a basic backup command… Nothing fancy, and we got 22MB/s, so not that great either.

 

Test Run B
Command

BACKUP DATABASE [AdventureWorks2012]
TO DISK = N’d:\Backup\Test.bak’ WITH INIT,
NAME = N’HC_TST-Full Database Backup’, SKIP, COMPRESSION, NOREWIND, NOUNLOAD, STATS = 10

Result

Processed 192888 pages for database ‘AdventureWorks2012’, file ‘AdventureWorks2012_Data’ on file 1.
Processed 405535 pages for database ‘AdventureWorks2012’, file ‘AdventureWorks2012_Log’ on file 1.
BACKUP DATABASE successfully processed 598423 pages in 152.066 seconds (30.744 MB/sec).

Comment

Now we added compression, and the backup command went up by 50%.

 

Test Run C
Command

BACKUP DATABASE [AdventureWorks2012original]
TO DISK = N’d:\Backup\Test.bak’ WITH INIT,
NAME = N’HC_TST-Full Database Backup’, SKIP, COMPRESSION, NOREWIND, NOUNLOAD, STATS = 10

Result

Processed 24192 pages for database ‘AdventureWorks2012original’, file ‘AdventureWorks2012_Data’ on file 1.
Processed 2 pages for database ‘AdventureWorks2012original’, file ‘AdventureWorks2012_Log’ on file 1.
BACKUP DATABASE successfully processed 24194 pages in 6.776 seconds (27.894 MB/sec).

Comment

Running the same command on the original (smaller) database had a performance that was a tad slower.

 

Test Run D
Command

BACKUP DATABASE [AdventureWorks2012]
TO DISK = N’d:\Backup\Test.bak’ WITH INIT,
NAME = N’HC_TST-Full Database Backup’, SKIP, COMPRESSION, NOREWIND, NOUNLOAD, STATS = 10, BLOCKSIZE = 65536, MAXTRANSFERSIZE=2097152

Result

Processed 192888 pages for database ‘AdventureWorks2012’, file ‘AdventureWorks2012_Data’ on file 1.
Processed 405539 pages for database ‘AdventureWorks2012’, file ‘AdventureWorks2012_Log’ on file 1.
BACKUP DATABASE successfully processed 598427 pages in 119.499 seconds (39.123 MB/sec).

Comment

Now we’ll add a fixed block size and max transfer size. Be aware that the blocksize is 65k for tape devices and 512 for anything else. So we’ll beefed that up and got a bit less than double from where we started.

 

Test Run E
Command

BACKUP DATABASE [AdventureWorks2012]
TO DISK = N’d:\Backup\Test.bak’ WITH INIT,
NAME = N’HC_TST-Full Database Backup’, SKIP, COMPRESSION, NOREWIND, NOUNLOAD, STATS = 10,BUFFERCOUNT = 20, BLOCKSIZE = 65536, ,MAXTRANSFERSIZE=2097152

Result

Processed 192888 pages for database ‘AdventureWorks2012’, file ‘AdventureWorks2012_Data’ on file 1.
Processed 405546 pages for database ‘AdventureWorks2012’, file ‘AdventureWorks2012_Log’ on file 1.
BACKUP DATABASE successfully processed 598434 pages in 100.559 seconds (46.492 MB/sec).

Comment

Now we’ll add a buffer count and things heat up even more.

 

Test Run F
Command

BACKUP DATABASE [AdventureWorks2012]
TO DISK = N’d:\Backup\Test.bak’ WITH INIT,
NAME = N’HC_TST-Full Database Backup’, SKIP, COMPRESSION, NOREWIND, NOUNLOAD, STATS = 10,BUFFERCOUNT = 100, BLOCKSIZE = 65536, MAXTRANSFERSIZE=2097152

Result

Processed 192888 pages for database ‘AdventureWorks2012’, file ‘AdventureWorks2012_Data’ on file 1.
Processed 405550 pages for database ‘AdventureWorks2012’, file ‘AdventureWorks2012_Log’ on file 1.
BACKUP DATABASE successfully processed 598438 pages in 97.435 seconds (47.983 MB/sec).

Comment

And let’s pump up the buffer count… a tad better.

 

Test Run G
Command

BACKUP DATABASE [AdventureWorks2012original]
TO DISK = N’d:\Backup\Test.bak’ WITH INIT,
NAME = N’HC_TST-Full Database Backup’, SKIP, COMPRESSION, NOREWIND, NOUNLOAD, STATS = 10, BUFFERCOUNT = 100, BLOCKSIZE = 65536, MAXTRANSFERSIZE=2097152

Result

Processed 24192 pages for database ‘AdventureWorks2012original’, file ‘AdventureWorks2012_Data’ on file 1.
Processed 3 pages for database ‘AdventureWorks2012original’, file ‘AdventureWorks2012_Log’ on file 1.
BACKUP DATABASE successfully processed 24195 pages in 4.437 seconds (42.600 MB/sec).

Comment

Doing the same with the smaller database gives us a lower performance.

 

Test Run H
Command

BACKUP DATABASE [AdventureWorks2012]
TO DISK = N’d:\Backup\Test.bak’ WITH INIT,
NAME = N’HC_TST-Full Database Backup’, SKIP, COMPRESSION, NOREWIND, NOUNLOAD, STATS = 10, BUFFERCOUNT = 100, BLOCKSIZE = 65536, MAXTRANSFERSIZE=2097152

Result

Processed 648352 pages for database ‘AdventureWorks2012’, file ‘AdventureWorks2012_Data’ on file 1.
Processed 1592447 pages for database ‘AdventureWorks2012’, file ‘AdventureWorks2012_Log’ on file 1.
BACKUP DATABASE successfully processed 2240799 pages in 260.300 seconds (67.254 MB/sec).

Comment

I ran the “increase” script a few more time to have a database from about 5.2GB. The performance is better than the two other smaller ones.

Conclusion

  • Parameters : Choosing your parameters matters A LOT. We had a case that went from 9MB/s to 90MB/s by using the same set of parameters as above…
  • Database Size : There is a ramp up time involved… Small databases experience a penalty due to this.
  • Design : Be aware of your disk subsystem design. Using one standard disk will provide you with 500 IOPS/s. If you use this disk for both source & target, then you don’t need to be a professor to know that this won’t yield a big performance.