In the summer of 2018, the 2nd generation of the Azure Data Lake Storage was announced. In today’s post, we’ll delve into the authentication & authorization part of this service. We’re going to see how we can leverage AAD to tighten security around our Data Lake.
To help us in this storyline, we’ll be looking to solve the following use case. A customer has stored a lot of data on its Data Lake, and is looking to provide a “partner” access to a subset of the data. In this use case, what would we need to to to achieve this goal?
Azure Data Lake Storage : Access Control Model
The first part of our puzzle is looking at the “Access Control Model“… In essence there are four ways to provide access to the data lake ;
- Shared Key ; The caller effectively gains ‘super-user’ access, meaning full access to all operations on all resources, including setting owner and changing ACLs
- SAS Tokens ; The token includes the allowed permissions as part of the token. The permissions included in the SAS token are effectively applied to all authorization decisions, but no additional ACL checks are performed.
- Azure RBAC ; Azure Role-based Access Control (RBAC) uses role assignments to effectively apply sets of permissions to users, groups, and service principals for Azure resources. Typically, those Azure resources are constrained to top-level resources (e.g., Azure Storage accounts). In the case of Azure Storage, and consequently Azure Data Lake Storage Gen2, this mechanism has been extended to the file system resource.
- ACL ; And last, but not least, we have the access control list we can apply at a more fine-grained level.
Important to note that there is a flow to decide what takes priority over another… Basically the shared key & SAS tokens ‘always win’, and are seen as the “superuser”. The next in kin is the “Azure RBAC”. If there is a fine-grained match found on the RBAC level, then this will be applied. Where the ACLs are “the last resort”.
So be aware of how this priority flow goes, as this can potentially give you some unwanted side effects, if you were not aware of this! On the other hand, this also provides you with a way to simplify your permission model and see the ACLs as a way to handle “exceptions” (like sharing with a partner…).
Azure Data Lake Storage : AAD Integration (Theory)
When looking at Azure RBAC & ACLs, then we can see that they both leverage Azure Active Directory in terms of the authentication (and subsequently the authorization) part. Now let’s take a look at a 10-mile high view from how this would link back to our defined use case…
In regards to the partnership, we’ll be inviting the user we want to grant rights to our own tenant (via an AAD B2B invite). At this point, we’ll have a kind of “stub user” in our own tenant. The user will still authenticate versus its own tenant. Though we’ll be able to grant that user rights (“authorization”) to our own resources via that “stub user”.
Azure Data Lake Storage : AAD Integration (Practical)
What does that look in reality? Here you can see a (B2B) user (from the partner tenant, “kvaes.be”), that resides in the “customer tenant” (microsoft.com). You can see that the source column indicate that this user is from an external AAD tenant.
If we would grant this user permissions on the Azure management pane (Azure Resource Manager), then we would be able to resolve this users “UPN” (User Principal Name = firstname.lastname@example.org). That being said, ADLS Gen2 handles that part a bit differently. It’s not able to renumerate (“translate”) the UPN when granting the permissions on ACL level. To do this, browse to the user’s object in the AAD Tenant. Once found, copy its “Object ID” as follows ;
Now you can use this Object ID in order to define the ACLs on the ADLS. Let’s take a look how that would look in Azure Storage Explorer ;
Once added, you’ll see that the object id does get renumerated/translated. And you’ll notice that it’s the same weird B2B format you might already be accustomed too. 😉
Azure Data Lake Storage : POSIX Rights (Theory)
Now that we can leverage that partner user, we can start setting up ACLs on our data. So let’s take a look how the permission structure works ;
Which translates to the typical POSIX number style ;
Which sounds very nice, but what kind of trail do we need to set up?
If we want to read “Data.txt”, then we need to grant “R” (4) on Data.txt… But also grant “X” (1) on the full path up to that file.
Azure Data Lake Storage : AAD Integration & B2B – An example for our use case – Test scenario
Now let’s delve into this a bit deeper… Let’s create the following structure ;
- File System
- fully-shared (RWX / 7)
- partially-shared (–R / 1)
- private (— / 0)
- fully-shared (RWX / 7)
Inside of those folders, we’re going to setup a file called “azcopy” (of 21MB) with the ACL set to “R–” (4)
Azure Data Lake Storage : AAD Integration & B2B – An example for our use case – User experience from Azure Storage Explorer
For our first test, let’s check what the user experience will be from Azure Storage Explorer… First I’ll be logging out the user from the “Customer Tenant”
Though before logging in with my “B2B user”, we’ll need to make sure that the user “sees” the storage account. This might be a bit of a mind fart at first… Though Azure Storage Accounts shows the storage accounts that are visible to your user on an RBAC level. To achieve this, without intervening with the priority flow as discussed earlier, I granted the B2B user the “Reader”-role on storage account level.
So let’s continue and login with the user from the “Partner Tenant” (B2B user) ;
Once done, it’ll show our storage account. And now let’s try to access our folders ;
- fully-shared : This one will show our “azcopy” file, and we’ll be able to read (~download) the file accordingly.
- partially-shared ; This one will return an error…
- private ; This one will return an error too!
Here you might wonder why the “partially-shared” one is giving us an error… We gave it the +x (1) on folder level to allow the traversal, and the file has +r (4) to allow the reads. Here we are faced with the logic from the Azure Storage Explorer. In order to show the file, it wants to show the contents of the folder. Though it was “only” granted the +x to allow the traversal, and not +r to list the contents. It’ll fail in trying to obtain the file list.
Azure Data Lake Storage : AAD Integration & B2B – An example for our use case – User experience from AzCopy
For our next user flow, we’re going to see what the user experience would be if we leveraged AzCopy. For this, note down the DFS endpoint. This can be done from the Storage Explorer (as shown), or you can also get it from the properties of the Azure Storage Account in the Azure Portal.
Now let’s do a login, by using the command line statement ; “azcopy login”
From a browser session, let’s do the device login sequence.
Entering our code
And authenticating… (with our B2B user!)
And… boom… we’re signed in.
Next up, we’re going to download the azcopy file from our “fully-shared” folder ; “azcopy az https://storageaccount.dfs.core.windows.net/fully-shared-azcopy ./fullyshared”
And … FAILED! So what went wrong? Behind the scenes, AzCopy is going to use the tenant(id) information from the partner tenant. So we’re going to need to specify the tenantid of the customer tenant to do the login. This directory id can be found at the properties menu of your AAD tenant. Let’s copy it…
And we’re going to use it in our login sequence ; “az login –tenant-id *customer-tenant-id*”
And let’s try downloading our “azcopy” file again from the “fully-shared” folder ;
And now it works! Let’s try the private one…
Nope, we’re not allowed to traverse the private folder. Despite that we have read rights on the file itself! Next up, the “partially-shared” one, where we have traversal rights on the folder, but not the permissions to list the content.
And boom, that one works too! Where this flow did not work via the Azure Storage Explorer. It does work when we try to access the path directly (as we knew the full path).
Where we now have downloaded two files. The private one was still inaccessible to us (with due reason!).
Azure Data Lake Storage : AAD Integration & B2B – The impact of “read” on folder level with “private” files
Now let’s add a bit to our example setup. I’ve uploaded a file called “secret.pdf” to our “fully-shared” directory. So we’re able to traverse AND list the content of our folder. Though I’ve not granted any permissions to our B2B user.
Now let’s see if we can download the file.
The transfer procedure is started…
And it fails!
Though the odd thing is, that there is a file locally on my file system.
Where it’s empty in terms of content! Now let’s attempt the same thing from AzCopy ;
And that one fails too!
So what did we see today?
- Grant partners right on an individual basis by leveraging a B2B user
- Granting permissions to B2B users is done by leveraging the object id of that B2B user from the “customer tenant”
- Specify the tenantid of the “customer tenant” when logging in with AzCopy
- If you only want to grant read access to a file. Grant the traversal permission to all the folders in the path, and read to the file
- If you granted list on the folder level, you’ll be able to see the file names of files which were not meant for your eyes
- If you do not have granted list on folder level, then you won’t be able to see the files in the folder (despite having read access on an individual file in that folder)
- Access Control Model
- There is a priority flow, where the ACLs are given the lowest priority
- Try to avoid the use of Access Keys & SAS Tokens, as they’ll give you “the keys to the kingdom”