Data Workflows in Azure : Taking an end-to-end look from ingest to reporting!

Introduction

There are a lot of scenario’s where organization are leveraging Azure to process their data at scale. In today’s post I’m going to go through the various pieces that can connect the puzzle for you in such a work flow. Starting from ingesting the data into Azure, and afterwards processing it in a scalable & sustainable manner.

 

High Level Architecture

As always, let’s start with a high level architecture to discuss what we’ll be discussing today ;

 

  • Ingest : The entire story starts here, where the data is being ingested into Azure. This can be done via an offline transfer (Azure DataBox), or online via (Azure DataBox Edge/Gateway, or using the REST API, AzCopy, …).
  • Staging Area : No matter what ingestation method you’re using, the data will end up in a storage location (which we’ll now dub “Staging Area”). From there one we’ll be able to transfer it to it’s “final destination”.
  • Processing Area : This is the “final destination” for the ingested content. Why does this differ from the staging area? Cause there are a variety of reasons to put data in another location. Ranging from business rules and the linked conventions (like naming, folder structure, etc), towards more technical reasons like proximity to other systems or spreading the data across different storage accounts/locations.
  • Azure Data Factory : This service provides a low/no-code way of modelling out your data workflow & having an awesome way of following up your jobs in operations. It’ll serve as the key orchestrator for all your workflows.
  • Azure Functions : Where there are already a good set of activities (“tasks”) available in ADF (Azure Data Factory), the ability to link functions into it extends the possibility for your organization even more. Now you can link your custom business logic right into the workflows.
  • Cosmos DB : As you probably want to keep some metadata on your data, we’ll be using Cosmos DB for that one. Where Functions will serve as the front-end API layer to connect to that data.
  • Azure BatchData Bricks : Both Batch & Data Bricks can be directly called upon from ADF, providing key processing power in your workflows!
  • Azure Key Vault : Having secrets lying around & possibly being exposed is never a good idea. Therefor it’s highly recommended to leverage the Key Vault integration for storing your secrets!
  • Azure DevOps : Next to the above, we’ll be relying on Azure DevOps as our core CI/CD pipeline and trusted code repository. We can use it to build & deploy our Azure Functions & Batch Applications, as for storing our ADF templates & Data Bricks notebooks.
  • Application Insights : Key to any successful application is collecting the much needed telemetry, where Application Insights is more than suited for this task.
  • Log Analytics : ADF provides native integration with Log Analytics. This will provide us with an awesome way to take a look at the status of our pipelines & activities.
  • PowerBI : In terms of reporting, we’ll be using PowerBI to collect the data that was pumped into Log Analytics and joining it with the metadata from Cosmos DB. Thus providing us with live data on the status of our workflow!

 

Now let’s take a look at that End-to-End flow!

Continue reading “Data Workflows in Azure : Taking an end-to-end look from ingest to reporting!”

IoT Prototyping in Azure with Particle & Grove

Introduction

Today’s post will be on how I see the smoothest way to do prototyping & hobby projects in regards to IoT. What is my main principle in deciding this? I only want to spend time on “business logic” and not waste time on the nuts & bolts of the engine.

Architecture

So what’s the architecture we’ll be using for this?

  1. Device : Particle Photon + Grove Expansion Board + Grove Sensors (Temperature & Air Quality )
  2. Particle Platform : Used for the development
  3. Azure IoT Hub : Basically a 1:1 link with Particle, which will take over once we go to a production grade setup.
  4. Azure Stream Analytics : Streaming the ingest data from our IoT Hub towards our various landing zones.
  5. Azure CosmosDB : For storing the data we’ll use in our reports.
  6. Azure Storage Account : Cheap storage where we keep all the data we collected, and which we could use for our analytics.
  7. PowerBI : The make nice reports of the data we collected. 😉

Now let’s delve into these parts one by one!

Continue reading “IoT Prototyping in Azure with Particle & Grove”

Did anyone say Azure Active Directory reports in PowerBi?

Introduction

A few days ago an announcement was made that there a PowerBI content pack has been published for Azure Active Directory! So let’s take that one out for a spin today and see what it can bring to the table.

 

Setting up the integration

This is one of the reasons I really like “cloud”! Integration almost has no entry barrier! Anyhow, in PowerBi, click on “Get Data”.

2017-01-24-10_06_33-power-bi

Continue reading “Did anyone say Azure Active Directory reports in PowerBi?”

Basic Azure IoT Flow : From Event Hub via Stream Analytics to Power Bi

Introduction

A few weeks back I posted a blog post on how you can leverage “serverless” components for IoT. Today I’ll show you what it would mean if we replace the Azure Functions component in that post by Azure Stream Analytics.

 

Flow

So the flow between device and event hub is untouched. Though we’ll replace the functions part with Azure Stream Analytics. So how will the end result look?

2017-01-03-11_11_49-job-diagram-microsoft-azure

We’ll be using one Stream Analytics job to trigger three flows. One will store the data into an Azure Table Storage, another on will store it as a JSON file on to Azure Blob Storage and another one will stream it directly into a PowerBi dataset.

So let’s take a look at all the components from within this Stream Analytics Flow we’ll be using…

2017-01-03-11_23_08-inputs-microsoft-azure

Continue reading “Basic Azure IoT Flow : From Event Hub via Stream Analytics to Power Bi”