Data Workflows in Azure : Taking an end-to-end look from ingest to reporting!

Introduction

There are a lot of scenario’s where organization are leveraging Azure to process their data at scale. In today’s post I’m going to go through the various pieces that can connect the puzzle for you in such a work flow. Starting from ingesting the data into Azure, and afterwards processing it in a scalable & sustainable manner.

 

High Level Architecture

As always, let’s start with a high level architecture to discuss what we’ll be discussing today ;

 

  • Ingest : The entire story starts here, where the data is being ingested into Azure. This can be done via an offline transfer (Azure DataBox), or online via (Azure DataBox Edge/Gateway, or using the REST API, AzCopy, …).
  • Staging Area : No matter what ingestation method you’re using, the data will end up in a storage location (which we’ll now dub “Staging Area”). From there one we’ll be able to transfer it to it’s “final destination”.
  • Processing Area : This is the “final destination” for the ingested content. Why does this differ from the staging area? Cause there are a variety of reasons to put data in another location. Ranging from business rules and the linked conventions (like naming, folder structure, etc), towards more technical reasons like proximity to other systems or spreading the data across different storage accounts/locations.
  • Azure Data Factory : This service provides a low/no-code way of modelling out your data workflow & having an awesome way of following up your jobs in operations. It’ll serve as the key orchestrator for all your workflows.
  • Azure Functions : Where there are already a good set of activities (“tasks”) available in ADF (Azure Data Factory), the ability to link functions into it extends the possibility for your organization even more. Now you can link your custom business logic right into the workflows.
  • Cosmos DB : As you probably want to keep some metadata on your data, we’ll be using Cosmos DB for that one. Where Functions will serve as the front-end API layer to connect to that data.
  • Azure BatchData Bricks : Both Batch & Data Bricks can be directly called upon from ADF, providing key processing power in your workflows!
  • Azure Key Vault : Having secrets lying around & possibly being exposed is never a good idea. Therefor it’s highly recommended to leverage the Key Vault integration for storing your secrets!
  • Azure DevOps : Next to the above, we’ll be relying on Azure DevOps as our core CI/CD pipeline and trusted code repository. We can use it to build & deploy our Azure Functions & Batch Applications, as for storing our ADF templates & Data Bricks notebooks.
  • Application Insights : Key to any successful application is collecting the much needed telemetry, where Application Insights is more than suited for this task.
  • Log Analytics : ADF provides native integration with Log Analytics. This will provide us with an awesome way to take a look at the status of our pipelines & activities.
  • PowerBI : In terms of reporting, we’ll be using PowerBI to collect the data that was pumped into Log Analytics and joining it with the metadata from Cosmos DB. Thus providing us with live data on the status of our workflow!

 

Now let’s take a look at that End-to-End flow!

Continue reading “Data Workflows in Azure : Taking an end-to-end look from ingest to reporting!”

Azure Functions : Compiled or interpreted C#… What impact does it have on my performance?

Introduction

Last week I did a post about how to integrate Compiled Azure Functions working with VSTS… In the closing thoughts I made a statement about my observation that compiled functions had a performance improvement.

 

Here I should have known Nills would challenge me on that… 😉

 

So… #challengeaccepted

Continue reading “Azure Functions : Compiled or interpreted C#… What impact does it have on my performance?”

Replatforming Azure Functions into an Azure Functions Container

Introduction

A while ago I talked about  “Faas/Serverless” in relation to vendor lock-in. Today we’ll be continuing in that road, where we’ll be doing a small proof-of-concept (PoC). In this PoC, we’ll be replatforming existing Azure Functions code into an Azure Functions container!

 

Things to know

Since Azure Functions 2.0 (in preview at the time of writing this post), you are able to leverage containers. Though be aware that there are several known issues. Do check them out first before embarking on your journey!

 

Testdriving 2.0

So first, we’ll start off with testing the Azure Functions Core Tools!  If you’re looking to follow this guide, be sure to have the Azure Functions Core Tools installed, which also depends on .NET Core 2.0 and Nodejs. Once you have those installed, do a “func –help”, and you’ll see what capabilities are at hand…

Continue reading “Replatforming Azure Functions into an Azure Functions Container”

Serverless On-Demand Scaling : Pushing the pedal when you need it…

Introduction

A lot of workloads are driven by peak consumption. From my experience, there aren’t the amount of workloads that have a constant performance need are in the minority. Now here comes the interesting opportunity when leveraging serverless architectures… Here you only pay for your actual consumption. So if you tweak your architecture to leverage this, then you can get huge gains!

For today’s post, I’ll be using VMchooser once again as an example. A lot has changed since the last post on the anatomy of this application. Here is an updated drawing of the high level architecture ;

Underneath you can see the flow that’ll be used when doing a “Bulk Mapping” (aka “CSV Upload”). The webapp (“frontend”) will store the CSV as a blob on the storage account. Once a new blob arrives, a function will be triggered that will examine the CSV file and put every entry onto a queue. Once a message is published onto the queue, another function will start processing this message. By using this pattern, I’m transforming this job into parallel processing job where each entry is handled (about) simultaneously. The downside of this, is that there will be contention/competition for the back-end resources (being the data store). Luckily, CosmosDB can scale on the fly too… We can adapt the request units as needed; up or down! So let’s do a small PoC and see who this could work…

Continue reading “Serverless On-Demand Scaling : Pushing the pedal when you need it…”

FaaS & Serverless – Vendor lock-in or not? Consider the cost of the full application lifecycle

Introduction

In my current role at Microsoft, I often talk about the possibilities in regards to application modernization. A typical ask in this space is to what kind of service they should use as a underlying platform for their own services. Where this commonly results in a (brief) discussion about VMs vs Containers vs Serverless/FaaS. Today’s post is about my personal take on the matter.

 

Setting the scene

First let’s start with setting the scene a bit… For today I’ll try to focus on the application modernization landscape, where the same goes for the data platform stack. Here you can pretty much interchange “Functions” with “Data Lake Analytics” and “Containers” with “HD Insights”. Though we’ll not go into that detail, in order to reduce the complexity of the post. 😉

When looking towards the spectum, the first thing to acknowledge is the difference in service models. Here we mainly have two service models in play ;

Continue reading “FaaS & Serverless – Vendor lock-in or not? Consider the cost of the full application lifecycle”

Hardening Azure Functions when exposing them via Azure API Management

Introduction

In my discussions with customers about “serverless”, we often talk about the typical security patterns when embarking on the deployment of functions for Enterprise organizations. A typical combination we see here is where Azure API Management is used in front of Azure Functions. Today we’ll talk about the options at hand here. In essence this will related to a choice where an organization will need to choose between “Fully Isolated” and “Full Flexibility”!

Continue reading “Hardening Azure Functions when exposing them via Azure API Management”

Putting Azure API Management in front of an Azure Function API

Introduction

Today’s post will be on how to expose an API hosted via an Azure function via Azure API management. So what are we going to configure today? We’ll expose the function API externally. The “user” (or client app) will authenticate with API management via a “subscription key“. Afterwards API management will call the back-end function, where it will authenticate via the function authentication code.

 

Configuration

So let’s go to our function …

Where we’ll grab the “function URL”. This contains the query parameter “code” which uses the function key as authentication.

Continue reading “Putting Azure API Management in front of an Azure Function API”