Introduction
Earlier this week Laure showed me an awesome SDK that provides context aware, pluggable and customizable data protection and anonymization for text and images. Which is called ; Presidio! Where this has proven to be very useful for a use case we were working on. In today’s post, we’ll take a look how you can leverage both App Service and Logic Apps to build your own demo with Presidio. Though if you want to test things straight away, do check the demo website as maintained by the Presidio team themselves ; https://presidio-demo.azurewebsites.net/.
What does it do?
In essence, there are two steps involved ;
- Analyzing ; Where an NLP trained model will go look for sensitive data and provide a list of locations where it found those.
- Anonymization ; Where the list of locations is then used to filter out / mask sensitive information.
To get to a result where you can ensure that sensitive data like credit card numbers, locations, names, SSN numbers, bitcoin wallets, phone numbers, financial data, etc can be kept confidential. Which you can see on the following example.
So about that proof of concept setup?
For the proof of concept, we leveraged the following setup ;
We had an App Service plan with two webapps running on top of it. One was running the analyzer and the other one the anonymizer. Both were deployed as container images with the HTTP endpoint internally exposed to App Service. Given that the entire process is a double stepped workflow, we leveraged Logic Apps as the orchestrator for this job. Where it would serve as an HTTP(S) endpoint to the outside world and return the results from the anonymizer. Making the consumer agnostic of the workflow needed to make things works. The consumer would “just” send their data to be anonymized, and would get the anonymized results back.
How do I deploy those?
You can easy deploy the back-end APIs with the following azure cli commands ;
# Inspiration from ; https://microsoft.github.io/presidio/installation/ + https://microsoft.github.io/presidio/samples/deployments/app-service/
# setting the variables
RESOURCE_GROUP=kvaes-poc-confidential
APP_SERVICE_NAME=proofofconcept
LOCATION=westeurope
APP_SERVICE_SKU=P1V3
WEBAPP_ANALYZER_NAME=pocanalyzer
WEBAPP_ANONIMIZER_NAME=pocanonimizer
IMAGE_ANALYZER=mcr.microsoft.com/presidio-analyzer
IMAGE_ANONIMIZER=mcr.microsoft.com/presidio-anonymizer
# create the resource group
az group create –name $RESOURCE_GROUP –location $LOCATION
# create the app service plan
az appservice plan create –name $APP_SERVICE_NAME –resource-group $RESOURCE_GROUP –is-linux –location $LOCATION –sku $APP_SERVICE_SKU
# create the web app using the official presidio images
az webapp create –name $WEBAPP_ANALYZER_NAME –plan $APP_SERVICE_NAME –resource-group $RESOURCE_GROUP -i $IMAGE_ANALYZER
az webapp create –name $WEBAPP_ANONIMIZER_NAME –plan $APP_SERVICE_NAME –resource-group $RESOURCE_GROUP -i $IMAGE_ANONIMIZER
# set container ports
az webapp config appsettings set –resource-group $RESOURCE_GROUP –name $WEBAPP_ANALYZER_NAME –settings PORT=”8080″
az webapp config appsettings set –resource-group $RESOURCE_GROUP –name $WEBAPP_ANONIMIZER_NAME –settings PORT=”8080″
# restart webapp
az webapp restart –resource-group $RESOURCE_GROUP –name $WEBAPP_ANALYZER_NAME
az webapp restart –resource-group $RESOURCE_GROUP –name $WEBAPP_ANONIMIZER_NAME
Where on the LogicApps side you can leverage the following code to get it working. Of course you will need to update the azure websites URLs to match the ones you just created.
{
"definition": {
"$schema": "https://schema.management.azure.com/providers/Microsoft.Logic/schemas/2016-06-01/workflowdefinition.json#",
"actions": {
"Condition": {
"actions": {
"API_Call_-_Analysis": {
"inputs": {
"body": {
"language": "@{triggerBody()?['language']}",
"text": "@{triggerBody()?['text']}"
},
"headers": {
"Content-type": "application/json"
},
"method": "POST",
"uri": "https://pocanalyzer.azurewebsites.net/analyze"
},
"runAfter": {},
"type": "Http"
},
"API_Call_-_Anonymization": {
"inputs": {
"body": {
"analyzer_results": "@body('API_Call_-_Analysis')",
"anonymizers": {
"DEFAULT": {
"new_value": "ANONYMIZED",
"type": "replace"
},
"PHONE_NUMBER": {
"chars_to_mask": 4,
"from_end": true,
"masking_char": "*",
"type": "mask"
}
},
"text": "@{triggerBody()?['text']}"
},
"headers": {
"Content-type": "application/json"
},
"method": "POST",
"uri": "https://pocanonimizer.azurewebsites.net/anonymize"
},
"runAfter": {
"API_Call_-_Analysis": [
"Succeeded"
]
},
"type": "Http"
},
"Response_-_Output_of_anonymization": {
"inputs": {
"body": "@body('API_Call_-_Anonymization')",
"statusCode": "@outputs('API_Call_-_Anonymization')['statusCode']"
},
"kind": "Http",
"runAfter": {
"API_Call_-_Anonymization": [
"Succeeded"
]
},
"type": "Response"
}
},
"else": {
"actions": {
"Response_-_400_-_No_text_supplied": {
"inputs": {
"body": "No text was supplied",
"statusCode": 400
},
"kind": "Http",
"runAfter": {},
"type": "Response"
}
}
},
"expression": {
"and": [
{
"not": {
"equals": [
"@triggerBody()?['text']",
""
]
}
}
]
},
"runAfter": {},
"type": "If"
}
},
"contentVersion": "1.0.0.0",
"outputs": {},
"parameters": {},
"triggers": {
"manual": {
"inputs": {
"schema": {
"properties": {
"language": {
"type": "string"
},
"text": {
"type": "string"
}
},
"type": "object"
}
},
"kind": "Http",
"type": "Request"
}
}
},
"parameters": {}
}
How do I test it?
In the Logic App, you can see what URL it will be listening to…
Use that url to execute the following curl command (for instance) ;
curl -X POST “https://prod-86.westeurope.logic.azure.com:443/workflows/c****************/triggers/manual/paths/invoke?api-version=2016-10-01&sp=%2Ftriggers%2Fmanual%2Frun&sv=1.0&sig=************” -H “Content-type: application/json” –data “{ \”text\”: \”John Smith drivers license is AC432223\”, \”language\” : \”en\”}”
Which will then give the following output ;
{“text”:”ANONYMIZED drivers license is ANONYMIZED“,”items”:[{“start”:30,”end”:40,”entity_type”:”US_DRIVER_LICENSE”,”text”:”ANONYMIZED”,”operator”:”replace”},{“start”:0,”end”:10,”entity_type”:”PERSON”,”text”:”ANONYMIZED”,”operator”:”replace”}]}
Where you can see that we sent the sentence ; “John Smith drivers license is AC432223” and got back the following text ; “ANONYMIZED drivers license is ANONYMIZED”. Where the information is nicely redacted!
How can I use this?
Very generically spoken, from a workflow process you can use this both ;
- Before you store the data and want to ensure it is stored in an anonymized manner. Which has the advantage that the info cannot be leaked. Though you will also not be able to retrieve it anymore.
- Before you make it accessible for consumption. Where the API of your app does the anonymization before it returns the data. You can then choose, depending on a given set of business rules, to keep the data confidential … in full… partially … or not at all.
Closing Thoughts
Data Protection and Anonymization is not an easy topic. What I like about Presidio is that it makes this topic accessible! Having an SDK which you can infuse into your application / workflow is an awesome thing to have. Where the SDK is provided under the MIT license, you are given a very permissive software license to integrate with.