In an earlier blog post I discussed the decision criteria in selecting a VM. In that post I also showed a tool called “VMchooser“. Today’s post will be on the architecture I used to build this one. As you might have guessed, it’s built on Azure components. Let’s get to it and check the anatomy of this application.
High Level Architecture
VMchooser has the following high level architecture ;
- Web App : The front-end of the application is hosted on an Azure Web App.
- Azure Functions : The back-end API & batch parser are built with Azure Functions. Which unlocks insane scaling possibilities.
- Storage Account : The storage account serves as decoupled/central storage component for the batch parsing. And it could also be used for hosting the “database” (flat file).
- Application Insights : Application insights is used to have the needed insights into the usage & other metrics.
- Github : All code for this project is open-source and publically hosted. You can run your own VMchooser if you want… 😉 Every change is immediately pushed towards the front-end, back-end & database.
- API Management : As the back-end API is decoupled from the application, I’ve also linked this api with api management. This would provide me with the option to allow 3th party application integrations via an API subscription plan.
As mentioned, the webapp is used to serve the front-end ;
And as I’m a bit of a cheap-ass, I’ve deployed about 13 webapps on a single app service plan… Here I’m even using a “B1”-plan, which costs me about 28€ per month.
I’ve setup the Github integration. So I’m having continuous deployment when I push changes to Github ;
My code is public, but my secrets are not…
These I insert into the application as environment variables.
For the back-end, I leverage “Function Apps”, or “serverless” to use the market(ing) name for it ;
You can choose to go for a consumption plan or link it to an existing app service plan. For the moment I’ve linked it to my app service plan (remember, the 27€ thingie with already 13 web apps in it). Though, imagine this thing going viral, then I can easily move it to a consumption plan and scale insanely outside of my own app service plan.
Next to that, just as with my web app, I’ve also setup Github integration ;
And also using environment variables for my secrets ;
Do note that I’ve increased the timeout, as my parser function sometimes needs a few minutes if you feed it a few thousand VMs. 😉
Here I can also monitor the runs of my function ;
For those who are new to functions… Click on test and open up the output log down below!
Now you can do live tests of your function and see what shows up in the log & output. I really love that part! And for the API management integration part, it’s best to also provide a Swagger (API) definition ;
So why did I pick functions?
- Flexibility & Scalability : The ability to scale insanely. Though I start in a budget manner within my own service plan.
- Choose your language per function : the GetBestMatch is made with nodesjs, as an existing library already did a lot for me. Where the parser is written in PHP, as that provided me with the smoothest path.
- Management : Logging, testing & integrations out-of-the-box. No need to maintain servers…
The storage account serves as a base for the batch job handling. New upload are saved into the “input”. The ParseCsv function is then triggered by the presence of a new blob in this container. The output of the parser is stored into the “output” container. The webapp will check the existence of this file and return the location of that blob.
There is still room for optimizations in this area. Like, for instance, using SAS tokens for an enhanced security profile.
All code is published to Github. So you can roll your own “VMchooser” if you wish… 😉 Anyhow, all changes are immediately published to the webapp / functions. At this time, I’m even doing a direct query on the flat (csv) file that is stored on Github.
Possible optimizations here? If “VMchooser” grows popular (read: beyond my personal usage), I’ll have to upgrade to multiple environments to provide the needed stability during changes.
Thus also create several branches and link the production & non-production webapps to different branches. Though for now, that’s still overkill… Next to that, having the csv file on another medium (like the storage account) would also make more sense then. Atm I’m sacrificing some performance with the current setup, though that’s still bearable at this time. The architecture provides me with the ability to just change the location by adjusting an environment variable.
(Disclaimer : Really… Do not test in production! Unless canary testing…)
The basic question of “what’s going on with my app”… That’s the gap which application insights fills in this architecture.
At the moment, I’ve limited the implementation to the basic integration. In the future I can still optimize by adding several checkpoints within my code and let that flow back to application insights.
As my back-end API is darn scalable, I also played with the idea of API management. Here I could allow third parties access to the same API via a subscription model.
At this time, that part isn’t used… Though it’s nice to know the option is already feasible. If at any point, there should be a necessity for it. 😉
What does VMchooser cost?
- Web App : 27€ / month
- Shared with 12 other webapps
- Hosts the front-end code & also the back-end functions
- Storage Account : At 1.7 euro cent per GB, that will not kill me at all.
- Api Management : The developer edition starts at about 42€.
- Application Insights : Free up till 1GB. Then priced at 1.94€ per GB.
- Github : Free, as it’s a public repository.
After “optimizing” (read: removing api management due to no immediate need), the costs bears down to a few euro’s per month. Because in all honesty, the application plan (B1 @ 27€ / month) is used for A LOT. 🙂 Let’s say I would decouple the functions from that webapp. Then each CSV parsing of a 5″ jobs (several thousands of VMs) would cost me about ; 0.0042 euro cent per job for the parsing and at most 7 cent for all the API calls, if I’m being very pessimistic.
Tales from the crypt… euhr… field
For the CSV Parsing, I started off with a Logic App, that used a function as one of the steps. Here I learned there was a hard limit, being a 2 minute timeout on HTTP requests. As large jobs took longer than that, I switched to a functions only architecture. In reality that also had a great impact on the budget too… That set aside, I also needed to increase the timeout of my azure function to cope with the longer running jobs.
This architecture might be a “tad different” from what you are used too… I fully get that. Is this way a good plan for every application? Maybe not. 🙂 Though I hope it showed you that there is an alternative way in architecting your application(s). As you all probably know, I’m a huge fan of Azure and also the entire container & serverless spaces. The above was an example of a fully “serverless” architecture in Azure. Here I really love Azure Functions! And I highly recommend trying it out… This as I can claim it to be very good, though you’ll only truly believe me if you tried it out for yourself. Then you’ll notice the ease of setting things up & integrating it. But also the management features that come out of the box and reduce the development pain, euhr … annoyances, for you. 🙂