Earlier we setup a basic IoT flow where we captured temperature & humidity and stored it to various outputs. My objective for this week was to create a new flow, that would leverage one of those outputs and do an anomaly detection on the data received. As this detection might take some time, I did not want to do this “in-line” with my current flow. So I’ve added a new one… which kinda looks like this.
The details of the Machine Learning part in combination with Stream Analytics will be for another post. This as I’m still struggling a bit to get it full operational. 😉 So today we’ll “just” cover the Machine Learning aspect of the flow.
To be very clear up front… I’m by no means an expert at machine learning / big data / etc. In my quest to learn, I played around with the Machine Learning Studio of Azure, where I would like to share my experience on this. 😉
What will we be doing today?
- Train a model, which will use to detect anomalies in the data we’ll want to test.
- Setup a service flow that will be able to test data versus the trained model
- Publish (“deploy”) this service so that we can use it from other services (like Stream Analytics!).
Azure Machine Learning Studio
Once we deployed a “Machine Learning Workspace” in Azure, we can see three links ;
- Launch Machine Learning Studio ; To create experiments, train models, etc
- Launch Machine Learning Gallery ; If you need some inspiration…
- Launch Machine Learning Web Service Management ; Here we can deploy / manage services (from saved experiments).
We’ll first train a model & create the service. For this we’ll be using the Machine Learning Studio.
Here you can see I created two experiments. One I use to train my model, where the other one I use to create a service which leverages this model.
The GUI is very user-friendly and already provides a lot of modules you can use. And if you need some extra juice… you can run your own R/Python (if needed) ;
Training the model
So we’ll start by training a model that will be used for the anomaly detection. The outcome of my endeavours was the following ;
- Import the data(set), by using all historical data storage in my Azure Table Storage)
Which results in the following ;
- I’ll reduce the number of fields that I’ll present to the training module, as this will reflect back towards the fields that will be expected later on in the service too.
- The training will be done by a “Train Anomaly Detection Model”-module, which will be fed by the dataset and giving a learning module.
- For the learning module, we’ll use the “PCA-Based Anomaly Detection”. As I know the number of columns, we’ll be using the “Single Parameter” training mode, and set it to the number of columns. Not sure what do us? Check the documentation… 😉
- Afterwards we’ll do a test run by using the trained model in combination with our dataset.
So you can see that two columns were added ;
- Scored Labels
- Scored Probabilities
This is an indication towards the probability of this event being “correct”. So a low value will indicate that this is an anomaly…
Once we are happy with the results, we right click on the “Train…. Model”-module, and select “Save as Trained Model” ;
Follow that flow…And it will appear in the “Trained Models”-section ;
So we’re all ready to use this model in our service!
Creating the service
Next up we’ll create the service… which will have the following topology ;
When you take a look at this picture. You can notice a “toggle” just above “Save”. This allows you to switch between the “experiment” and “web service” view. Depending on which view, the input & output will be different…
The “grayed” out blocks (“web service input” and “web service output”) are only active in “web service”-view. Where the “import” data flow will only be relevant in the “experiment”-view.
What do we also see here? Kinda the same as we did in the part where we trained the model. The only difference here is that we’ll do the scoring via the trained model we saved.
Once we are happy with the results, we’ll click on the “deploy web service”-button. Which will take us through a nice wizard…
Managing the service
Once you would go to the “Azure Machine Learning Web Services”-portal, then you can see the web services you have published and the plans that are active. If you would go to “Web Services”, you can select your web service…
And here you’ll be able to test your service. Let’s try that one…
And that works great!
Now if you want to test it out via another application? Go to the “Consume”-tab and get the information you need! 😉
Bare in mind that this was my first machine learning experiment, I think the output is still nice. So I can conclude that this service is quite user-friendly. One of the most handy features I found was that each step/module has the ability to visualize the dataset at that point. So you can “debug” very easily… Despite being a “noob” in this, I must say I’m very impressed & inspired to work more with it.
3 thoughts on “Azure Machine Learning : Let’s check our IoT dataset for anomalies!”
I am grad student working on using machine learning algorithms to detect anomalies in IoT, I have been searching for datasets and it has proved difficult, please can you point me in the right direction or share knowledge of available datasets with me. I will really appreciate a reply. Thank you.
A common one is the NY Taxi data ; http://chriswhong.com/open-data/foil_nyc_taxi/
Thank you, but I don’t think it has testing data