Ever tried the mix of Azure, SQL Server, Storage Spaces Direct & Scale Out File Server?

Introduction

A while back I posted a blog post how to setup a High Available SQL cluster on Azure using SIOS Datakeeper. As I’m an avid believer of storage spaces, I was looking for a moment to test drive “storage spaces direct” on Azure. The blog post of today will cover that journey…

UPDATE (01/02/2017) ; At this point, there is no official support for this solution. So do not implement it for production at this point. As soon as this changes, I’ll update this post accordingly!

UPDATE (08/02/2017) ; New official documentation has been released. Though I cannot find official support statements.

UPDATE (30/03/2017) ; A few days after the previous update, the following post was made => Deploying IaaS VM Guest Clusters in Microsoft Azure

 

Solution Blueprint

What do we want to build today?

  • A two node cluster which will be used as a Failover Cluster Instance for MSSQL.
  • As a quorum, we’ll be using the cloud witness feature of Windows 2016 in combination with an Azure storage account.
  • In regards to storage, we’ll create a Scale Out File Server setup which will leverage the local disks of the two servers via Storage Spaces Direct.
  • To achieve a “floating IP”, we’ll be using the Azure LoadBalancer setup (as we did in the last post).

kvaes-sql-cluster-s2d-sofs-azure

 

Continue reading “Ever tried the mix of Azure, SQL Server, Storage Spaces Direct & Scale Out File Server?”

Azure : Benchmarking SQL Database Setups – To measure is to know, and being able to improve…

Introduction

To measure is to know. If you can not measure it, you cannot improve it!

Today’s post will go more in-depth on what performance to expect from different SQL implementations in Azure. We’ll be focussing on two kind of benchmarks ; the storage subsystem and an industry benchmark for SQL. This so that we can compare the different scenario’s to each other in the most neutral way possible.

to-measure-is-to-know-storage-database-performance-kvaes

Test Setup

As a test bed I started from one of my previous posts

kvaes-azure-sql-cluster-sios-datakeeper-high-availability-ha

The machines I used were DS1 v2 machines when using single disks and a DS2 v2 machines when using multiple disks. In terms of OS, I’ll be using Windows 2012 R2 and MSSQL 2014 (12.04100.1) as database.

Continue reading “Azure : Benchmarking SQL Database Setups – To measure is to know, and being able to improve…”

Azure : Setting up a high available SQL cluster with standard edition

Introduction

It is important to know that you will only get an SLA (99,95%) with Azure when you have two machines deployed (within one availability set) that do the same thing. If this is not the case, then Microsoft will not guarantee anything. Why is that? Because during service windows, a machine can go down. Those service windows are quite broad in terms of time where you will not be able to negotiate or know the exact downtime.

That being said… Setting up your own high available SQL database is not that easy. There are several options, though it basically bears down to the following ;

  • an AlwaysOn Availability Groups setup
  • a Failover Cluster backed by SIOS datakeeper

Where I really like AlwaysOn, there are two downsides to that approach ;

  • to really enjoy it, you need the enterprise edition (which isn’t exactly cheap)
  • not all applications support AlwaysOn with their implementations

So a lot of organisations were stranded in terms of SQL and moving to Azure. Though, thank god, a third party tool introduced itself ; SIOS Datakeeper ! Now we can build our traditional Failover Cluster on Azure.

 

Design

Before we start, let’s delve into the design for our setup ;

kvaes-azure-sql-cluster-sios-datakeeper-high-availability-ha

Continue reading “Azure : Setting up a high available SQL cluster with standard edition”

Azure : Direct SQL Server to Storage Account Backup

Introduction
Did you know you can back up (and restore) your SQL data to (and from) your Azure Storage Account? No… Check out the following article ; SQL Server Backup to URL

IC639738.jpeg

How does that look in reality?
Browse to your storage account and click on “Manage Access Keys”…
2015-09-04 10_30_23-Storage - Microsoft Azure

Copy the “Primary Access Key” to your clipboard…
2015-09-04 10_30_56-Storage - Microsoft Azure

Now head over to your SQL Management studio and create the following credential…

CREATE CREDENTIAL [https://kvaestest01.blob.core.windows.net/sqlbackup] WITH IDENTITY='kvaestest01', SECRET='Dnwkn...SydB...gfsZjm...TJ9Ew=='

  • Credential Name : The url of your backup location
  • Identity : The name of your storage account
  • Secret : The primary access key

And you will see it pop up in your objects…

2015-09-04 10_31_29-kvaestest01-6 - kvaestest.cloudapp.net_55099 - Remote Desktop Connection

Now we are ready for the action!

BACKUP DATABASE AdventureWorks2012 TO
URL = 'https://kvaestest01.blob.core.windows.net/sqlbackup/AV2012_01.bak'
WITH CREDENTIAL = 'https://kvaestest01.blob.core.windows.net/sqlbackup'
,COMPRESSION
,STATS = 5;
GO

One thing to note. You can only use one URL in your backup command. So multiple backup files for concurrency reasons is sadly not possible.

Have fun!

Database variants explained : SQL or NoSQL? Is that really the question?

A first glance beyond the religion

When taking a look towards the landscape of databases, one can only accept that there has been a lot of commotion about “SQL vs NoSQL” in the last years. But what is it really about?

SQL, which stands for “Structured Query Language”, has been around since the seventies and is commonly used in relational databases. It consists of a data definition language to define the structure and a data manipulation language to alter the data within the structure. Therefore a RDBMS will have a defined structure and has been a common choice for the storage of information in new databases used for financial records, manufacturing and logistical information, personnel data, and other applications since the 1980s.

1401269083847

NoSQL, which stands for “Not only SQL”, departs from the standard relational model since it saw its first introduction in the nineties. The primary focus of these database was performance, or a given niche, and focus less consitency/transactions. These databases provide a mechanism for storage and retrieval of data that is modeled in means other than the tabular relations used in relational databases. Motivations for this approach include simplicity of design, horizontal scaling, and finer control over availability. The data structures used by NoSQL databases (e.g. key-value, graph, or document) differ from those used in relational databases, making some operations faster in NoSQL and others faster in relational databases. The particular suitability of a given NoSQL database depends on the problem it must solve.

So it depends on your need…

Do you want NoSQL, NoSQL, NoSQL or NoSQL?

NoSQL comes in various flavors. The most common types of NoSQL databases (as portrayed by Wikipedia) ;

There have been various approaches to classify NoSQL databases, each with different categories and subcategories. Because of the variety of approaches and overlaps it is difficult to get and maintain an overview of non-relational databases. Nevertheless, a basic classification is based on data model. A few examples in each category are:

  • Column: Accumulo, Cassandra, Druid, HBase, Vertica
  • Document: Clusterpoint, Apache CouchDB, Couchbase, MarkLogic, MongoDB, OrientDB
  • Key-value: Dynamo, FoundationDB, MemcacheDB, Redis, Riak, FairCom c-treeACE, Aerospike, OrientDB
  • Graph: Allegro, Neo4J, InfiniteGraph, OrientDB, Virtuoso, Stardog
  • Multi-model: OrientDB, FoundationDB, ArangoDB, Alchemy Database, CortexDB

Column

A column of a distributed data store is a NoSQL object of the lowest level in a keyspace. It is a tuple (a key-value pair) consisting of three elements:

  • Unique name: Used to reference the column
  • Value: The content of the column. It can have different types, like AsciiType, LongType, TimeUUIDType, UTF8Type among others.
  • Timestamp: The system timestamp used to determine the valid content.

Example

{
    street: {name: "street", value: "1234 x street", timestamp: 123456789},
    city: {name: "city", value: "san francisco", timestamp: 123456789},
    zip: {name: "zip", value: "94107", timestamp: 123456789},
}

Document

A document-oriented database is designed for storing, retrieving, and managing document-oriented information, also known as semi-structured data. The central concept of a document-oriented database is that Documents, in largely the usual English sense, contain vast amounts of data which can usefully be made available. Document-oriented database implementations differ widely in detail and functionality. Most accept documents in a variety of forms, and encapsulate them in a standardized internal format, while extracting at least some specific data items that are then associated with the document.

Example

<Article>
   <Author>
       <FirstName>Bob</FirstName>
       <Surname>Smith</Surname>
   </Author>
   <Abstract>This paper concerns....</Abstract>
   <Section n="1"><Title>Introduction</Title>
       <Para>...
   </Section>
 </Article>

Key-Value

A key-value (an associative array, map, symbol table,or dictionary) is an abstract data type composed of a collection of key/value pairs, such that each possible key appears just once in the collection.

Example

{
    "Pride and Prejudice": "Alice",
    "The Brothers Karamazov": "Pat",
    "Wuthering Heights": "Alice"
}

Graph

A graph database is a database that uses graph structures for semantic queries with nodes, edges, and properties to represent and store data. A graph database is any storage system that provides index-free adjacency. This means that every element contains a direct pointer to its adjacent elements and no index lookups are necessary. General graph databases that can store any graph are distinct from specialized graph databases such as triplestores and network databases.

Example

GraphDatabase_PropertyGraph

MultiModel

Most database management systems are organized around a single data model that determines how data can be organized, stored, and manipulated. In contrast, a multi-model database is designed to support multiple data models against a single, integrated backend. Document, graph, relational, and key-value models are examples of data models that may be supported by a multi-model database.

And what flavor do I want?

Each type and implementation has its own advantages… The following chart from Shankar Sahai provides a good overview ;

nosql-comparison-table

Any other considerations I should take into account?

Be wary that most implementations were not designed around consistency integrity and more towards performance. Transactions are referential integrity are not supported by most implementations. High availability designs (including on geographic level) are possible with some implementations, though this often implies a performance impact (as one would expect).

Also check out the research made by Altoros ;

5. Conclusion
As you can see, there is no perfect NoSQL database. Every database has its advantages and disadvantages that become more or less important depending on your preferences and the type of tasks.
For example, a database can demonstrate excellent performance, but once the amount of records exceeds a certain limit, the speed falls dramatically. It means that this particular solution can be good for moderate data loads and extremely fast computations, but it would not be suitable for jobs that require a lot of reads and writes. In addition, database performance also depends on the capacity of your hardware.

They did a very decent job in performance testing various implementations!

2015-01-21 09_08_23-A_Vendor_independent_Comparison_of_NoSQL_Databases_Cassandra_HBase_MongoDB_Riak.