Taking the Azure Data Box Gateway (preview) out for a spin!

Introduction

At the last Ignite conference, three new additions joined the Data Box family. In today’s post we’ll take one of those out for a spin, being the “Data Box Gateway“. This one comes as a virtual appliance that you can run on top of your own physical hardware.

So where does it fit into the picture?

Cloud archival – Copy hundreds of TBs of data to Azure storage using Data Box Gateway in a secure and efficient manner. The data can be ingested one time or an ongoing basis for archival scenarios.
Data aggregation – Aggregate data from multiple sources into a single location in Azure Storage for data processing and analytics.
Integration with on-premises workloads – Integrate with on-premises workloads such as backup and restore that use cloud storage and need local access for commonly used files.

Let’s take it for a spin!

So let’s make it a bit more tangible and see what the user experience is in setting it up & using it. Start by searching the Azure Marketplace for Data Box Gateway.

Once you press “create”, you can choose to go for Data Box Edge or Data Box Gateway ;

Here we’ll select the Data Box Gateway, as this is the one we’re going to take out for a spin… So we’ll name it and choose our resource group, subscription & location.

After we press “Create”, it’ll start deploying.

Once done, we’re presented with a step-by-step guide on setting it up.

As I’m going to use an ESXi deploy, I went for the VMware Image ;

Though be aware to meet the minimum requirements. As I was doing this test on my home (gaming) pc, I must admit this was a bit challenging.

Next to that, also be aware to use VMFS5 (or an NFS mount!) for your data store ;

Post-Deploy Installation Steps

Once you are sure you meet the requirements. Just follow the guide as instructed in the documentation. After a bit, you’re prompted with the ability to login. Be aware that by default the keyboard is set to Qwerty. So you might want to keep that in mind to avoid the following screen… 😉

Once, logged in, the post-installation will kick off ;

During the installation, if you would browse to the device, you’ll be prompted with the following screen ;

Depending on your system, this might take a while… Though be patient, and at a give time you are given the opportunity to login (again with the default password) ;

Once logged in, you can change the password ;

And once changed, you are prompted to login again ;

Once logged in, we can see that our device isn’t activated yet… and that it’s still loading up things dynamically (like the capacity).

Where the loading thingies quickly disappear, as you can imagine ;

Now let’s click on “Activation” to get the device… like.. um … activated!

In the meantime, let’s also get the activation key… Remember the overview page with all the practical information? It has a “Generate Key” button down below.

If you press it, a box will appear, where the key will end up in once generated.

And let’s copy it over!

Let’s paste it into the activation box of our appliance. And apply it…

Once activated, you can change the connectivity mode (if wanted) ;

And we’ll see that the configuration pane in our dashboard is now showing “Activation Completed”, and also indicating that cloud upload/download is enabled.

Let’s browse a bit through the interface… Device Name ;

Network settings ;

Web proxy settings ;

Time settings ;

Power settings ;

Software update ;

And diagnostic tests ;

Let’s run a test shall we?

And it seems everything is “Healthy”. Pfew!!! 😉

The notifications give us the typical look & feel of the Azure Portal ;

Where “Support” will provide us with the ability to generate a “Support package” ;

Where we can download that one afterwards ;

Where, as you could have expected, a lot of information about the setup/device/etc is collected for support.

Configuration from the Azure Portal

Now that our device is all set up. Let’s take a look at the portal… We can add bandwidth schedules ;

Users… (where AD integration is on the backlog) ;

SMB shares and…

also NFS shared. Which can be backed by block blobs, page blobs or files!

The properties provide a bit of info… 😉

And we can monitor both the capacity …

and & transactions (network) ;

Testing the online ingest!

The proof of the pudding is in the eating… Or that’s what people always say. I’m more of a chocomousse man… 😉 Anyhow, we’re distracted here. Let’s add a(n SMB) share, and link it to blob block container on a storage account.

That went smooth!

Now let’s see if we can map that share…

Tum ti dum…

Providing the credentials of the user we added…

Connecting…

And there we go! Which comes with a placeholder folder inside…

Now I’m going to copy some (modest amount : 28MB of) test data over.

Which gets synced nicely!

As the content in the container of the storage account and the one on the share map perfectly.

If we check out the metrics, then we can see a nice spike when the data was uploaded.

For our next test, I’m gonna copy over a bit more than 5GB of data.

Where we see the synchronization immediately kicking in…

And if you check out the details in windows explorer, then you’ll notice that some folders have an “X”-mark as opposed to not having it.

If I go one level deeper, it becomes a bit more granular.

Here the story is pretty easy… If you see an “X”, then that object is fully replicated to the cloud.

As you can see, we have 5 files in our storage account… All those have an “X”. Though two files still need to be fully replicated, and those have yet to be marked with an “X”.

Once done, we can see that the data was copied over nicely.

Though be aware that the limit here was more in terms of my personal internet connection, as I did not set any bandwidth limitations.

Time to boot down

As my testing phase was done, I powered down my virtual machine…

And Azure picked up this event nicely.

So you can also set up alerts on this if you would like!

Closing Thoughts

There are a variety of way to ingest data into Azure!

Off-line ; This can go to 1PB per box…
On-line
- Data Box Edge is a hardware based appliance which even provides you with the capability to pre-process data before ingesting it into the cloud.
- Data Box Gateway is a virtual appliance which lets you leverage your existing hardware and ingest towards the cloud.

Today we looked at the Data Box Gateway, which does as advertised! Though be aware that the appliance was designed with ingestion in mind. Do not see this as a replacement for your general purpose file server. As you’ll come back a tad disappointed for this. Though if you’re looking to push data to the cloud or having an (share based) archive mount point, then this is a great way of doing so. I can immediately see it powering PACS deployments and/or shipping (f.e. sensor) data to the cloud.

9 thoughts on “Taking the Azure Data Box Gateway (preview) out for a spin!”

Hi Karim,

In the docs Microsoft is referring to a fast access for local caching (hot tier). Do you have technical insights on what level you can configure this local cache (modification date, % disk, or…). Thanks in advance!

Nichola 😀

kvaes says:

24/10/2019 at 10:33

Sorry Nichola, missed your Q… Check the reply I gave to Matteo. It’ll probably clear up a lot of things.

Reply

Thanks for such a guide, Karim.
I am trying to test the Gateway virtual appliance in my environment. The setup script fails during the initial setup. Are you familiar with this error?
Thanks in advance for your help.

Invoke-Command : Cannot validate argument on parameter ‘Session’. The argument is null or empty. Provide an argument
that is not null or empty, and then try the command again.
At C:\hcs\HCSMinishell.ps1:39 char:39
+ $result = Invoke-Command -Session $Global:PSSession -ScriptBlock …
+ ~~~~~~~~~~~~~~~~~
+ CategoryInfo : InvalidData: (:) [Invoke-Command], ParameterBindingValidationException
+ FullyQualifiedErrorId : ParameterArgumentValidationError,Microsoft.PowerShell.Commands.InvokeCommandCommand

—————————————————————
Data Box Gateway
Copyright (C) 2018 Microsoft Corporation. All rights reserved.
One-time setup in progress. Commands may not function.
—————————————————————
Enter-PSSession : Cannot validate argument on parameter ‘Session’. The argument is null or empty. Provide an argument
that is not null or empty, and then try the command again.
At C:\hcs\HCSMinishell.ps1:289 char:30
+ Enter-PSSession -Session $Global:PSSession
+ ~~~~~~~~~~~~~~~~~
+ CategoryInfo : InvalidData: (:) [Enter-PSSession], ParameterBindingValidationException
+ FullyQualifiedErrorId : ParameterArgumentValidationError,Microsoft.PowerShell.Commands.EnterPSSessionCommand

PS C:\Users\EdgeUser>

kvaes says:

01/04/2019 at 23:32

Have not encountered this one before. Best course of action would be to open a support request (if you haven’t already done so).

Reply

Hi Karim, thanks for this guide.
I successfully deployed a Gateway virtual device aiming to test its performances before eventually adopt the Edge physical appliance. I wonder whether such device could be used to provide local cache for file sharing while using Azure Cloud Storage Files accessed via SMB/CIFS protocols. Ms is not being very clear on this aspect. Would you suggest such solution?

Many thanks

kvaes says:

23/10/2019 at 19:53

It will have a local cache. Though be aware that there is a big difference between StorSimple and Data Box. StorSimple was built with tiering in mind, where DataB ox was built as a way to ingest data into Azure. It’ll have a cache, but that mainly serves as the “temporary” storage before uploading the data. With Data Box Edge you can have a local share too, which does not replicate. Though the main gist is that it was built for a one-direction ingest flow. And not a bi-directional sync, where Azure File Sync serves as a better match. Does that make sense?

Reply
1. Matteo says:
  
  24/10/2019 at 09:45
  
  Thanks for getting back. I appreciate you should not rely on the Edge Box’s local cache as I am told you cannot control what the machine stores locally, however according to the documentation the selective contents process (what they call Machine Learning) should take place once you reach half of the capacity of the appliance (therefore not before 6 TB). As per my understanding, this means that if you have just a couple of TB of data, doing so should be relatively safe. Unfortunately StorSimple does not fulfil my goal (which is make the office serverless basically) as it requires maintenance more or less on the same line as a classic filer. The Edge Box on the other hand would have given me the opportunity to get a piece of mind device which (in my view) would have given the best user experience (SMB/CIFS access provided via LAN and not WAN) but still with the safety of the background sync to the Cloud if any Disaster occurs. It looks like I have to look for other options..

Take a look at Azure File Sync ;
https://docs.microsoft.com/en-us/azure/storage/files/storage-sync-files-deployment-guide?tabs=azure-portal

That will probably be more suited for your use case. Though it is not an appliance… The upside is that in case of disaster, one could run to the local IT shop and get a replacement system (hardware). You can then install Azure File Sync on it, and link it up again. Afterwards it will retrieve the files on demand again, so the users will be able to work again. It might be at a lower retrieval speed. Though functional… Worst case, you could even get the data off straight from the Azure Files connection.

Matteo says:

24/10/2019 at 15:35

Thanks! It still looks to me rather viable but very risky as the latency will still play a major role potentially affecting user experience way too much. Are you aware of any caching gateway appliance which could suitable for my purposes (basically what the Edge box does but with the possibility to store contents locally until, say, 80% of the capacity)? Many thanks once again

Reply

Nichola says:

29/03/2019 at 21:38

Hi Karim,

In the docs Microsoft is referring to a fast access for local caching (hot tier). Do you have technical insights on what level you can configure this local cache (modification date, % disk, or…). Thanks in advance!

Nichola 😀

1. kvaes says:
  
  24/10/2019 at 10:33
  
  Sorry Nichola, missed your Q… Check the reply I gave to Matteo. It’ll probably clear up a lot of things.
  
Ahmed Rafeq says:

30/03/2019 at 02:14

Thanks for such a guide, Karim.
I am trying to test the Gateway virtual appliance in my environment. The setup script fails during the initial setup. Are you familiar with this error?
Thanks in advance for your help.

Invoke-Command : Cannot validate argument on parameter ‘Session’. The argument is null or empty. Provide an argument
that is not null or empty, and then try the command again.
At C:\hcs\HCSMinishell.ps1:39 char:39
+ $result = Invoke-Command -Session $Global:PSSession -ScriptBlock …
+ ~~~~~~~~~~~~~~~~~
+ CategoryInfo : InvalidData: (:) [Invoke-Command], ParameterBindingValidationException
+ FullyQualifiedErrorId : ParameterArgumentValidationError,Microsoft.PowerShell.Commands.InvokeCommandCommand

—————————————————————
Data Box Gateway
Copyright (C) 2018 Microsoft Corporation. All rights reserved.
One-time setup in progress. Commands may not function.
—————————————————————
Enter-PSSession : Cannot validate argument on parameter ‘Session’. The argument is null or empty. Provide an argument
that is not null or empty, and then try the command again.
At C:\hcs\HCSMinishell.ps1:289 char:30
+ Enter-PSSession -Session $Global:PSSession
+ ~~~~~~~~~~~~~~~~~
+ CategoryInfo : InvalidData: (:) [Enter-PSSession], ParameterBindingValidationException
+ FullyQualifiedErrorId : ParameterArgumentValidationError,Microsoft.PowerShell.Commands.EnterPSSessionCommand

PS C:\Users\EdgeUser>

1. kvaes says:
  
  01/04/2019 at 23:32
  
  Have not encountered this one before. Best course of action would be to open a support request (if you haven’t already done so).
  
Matteo says:

23/10/2019 at 17:29

Hi Karim, thanks for this guide.
I successfully deployed a Gateway virtual device aiming to test its performances before eventually adopt the Edge physical appliance. I wonder whether such device could be used to provide local cache for file sharing while using Azure Cloud Storage Files accessed via SMB/CIFS protocols. Ms is not being very clear on this aspect. Would you suggest such solution?

Many thanks

1. kvaes says:
  
  23/10/2019 at 19:53
  
  It will have a local cache. Though be aware that there is a big difference between StorSimple and Data Box. StorSimple was built with tiering in mind, where DataB ox was built as a way to ingest data into Azure. It’ll have a cache, but that mainly serves as the “temporary” storage before uploading the data. With Data Box Edge you can have a local share too, which does not replicate. Though the main gist is that it was built for a one-direction ingest flow. And not a bi-directional sync, where Azure File Sync serves as a better match. Does that make sense?
  
  1. Matteo says:
    
    24/10/2019 at 09:45
    
    Thanks for getting back. I appreciate you should not rely on the Edge Box’s local cache as I am told you cannot control what the machine stores locally, however according to the documentation the selective contents process (what they call Machine Learning) should take place once you reach half of the capacity of the appliance (therefore not before 6 TB). As per my understanding, this means that if you have just a couple of TB of data, doing so should be relatively safe. Unfortunately StorSimple does not fulfil my goal (which is make the office serverless basically) as it requires maintenance more or less on the same line as a classic filer. The Edge Box on the other hand would have given me the opportunity to get a piece of mind device which (in my view) would have given the best user experience (SMB/CIFS access provided via LAN and not WAN) but still with the safety of the background sync to the Cloud if any Disaster occurs. It looks like I have to look for other options..
kvaes says:

24/10/2019 at 10:32

Take a look at Azure File Sync ;
https://docs.microsoft.com/en-us/azure/storage/files/storage-sync-files-deployment-guide?tabs=azure-portal

That will probably be more suited for your use case. Though it is not an appliance… The upside is that in case of disaster, one could run to the local IT shop and get a replacement system (hardware). You can then install Azure File Sync on it, and link it up again. Afterwards it will retrieve the files on demand again, so the users will be able to work again. It might be at a lower retrieval speed. Though functional… Worst case, you could even get the data off straight from the Azure Files connection.

1. Matteo says:
  
  24/10/2019 at 15:35
  
  Thanks! It still looks to me rather viable but very risky as the latency will still play a major role potentially affecting user experience way too much. Are you aware of any caching gateway appliance which could suitable for my purposes (basically what the Edge box does but with the possibility to store contents locally until, say, 80% of the capacity)? Many thanks once again