Moving from Azure VMs to Azure VM Scale Sets – Runtime Instance Configuration

In my previous post I covered how you can move from deploying a solution to pre-provisioned Virtual Machines (VMs) in Azure to a process that allows you to create a custom VM Image that you deploy into VM Scale Sets (VMSS) in Azure.

As I alluded to in that post, one item we will need to take care of in order to truly move to a VMSS approach using a VM image is to remove any local static configuration data we might bake into our solution.

There are a range of options you can move to when going down this path, from solutions you custom build to running services such as Hashicorp's Consul.

The environment I'm running in is fairly simple, so I decided to focus on a simple custom build. The remainder of this post is covering the approach I've used to build a solution that works for me, and perhaps might inspire you.

I am using an ASP.Net Web API as my example, but I am also using a similar pattern for Windows Services running on other VMSS instances - just the location your startup code goes will be different.

The Starting Point

Back in February I blogged about how I was managing configuration of a Web API I was deploying using VSTS Release Management. In that post I covered how you can use the excellent Tokenization Task to create a Web Deploy Parameters file that can be used to replace placeholders on deployment in the web.config of an application.

My sample web.config is shown below.

web.config

<configuration>
  <appSettings>
    <add key="webpages:Version" value="3.0.0.0" />
    <add key="webpages:Enabled" value="false" />
    <add key="ClientValidationEnabled" value="true" />
    <add key="UnobtrusiveJavaScriptEnabled" value="true" />
    <add key="LoggingDatabaseAccount" value="__docdburi__" />
    <add key="LoggingDatabaseKey" value="__docdbkey__" />
    <add key="LoggingDatabase" value="__loggingdb__" />
    <add key="LoggingDatabaseCollection" value="__loggingdbcollection__" />
  </appSettings>
</configuration>

The problem with this approach when we shift to VM Images is that these values are baked into the VM Image which is the build output, which in turn can be deployed to any environment. I could work around this by building VM Images for each environment to which I deploy, but frankly that is less than ideal and breaks the idea of "one binary (immutable VM), many environments".

The Solution

I didn't really want to go down the route of service discovery using something like Consul, and I really only wanted to use Azure's default networking setup. This networking requirement meant no custom private DNS I could use in some form of configuration service discovery based on hostname lookup.

...and.... to be honest, with the PaaS services I have in Azure, I can build my own solution pretty easily.

The solution I did land on looks similar to the below.

Store runtime configuration in Cosmos DB and geo-replicate this information so it is highly available. Each VMSS setup gets its own configuration document which is identified by a key-service pair as the document ID.
Leverage a read-only Access Key for Cosmos DB because we won't ever ask clients to update their own config!
Use Azure Key Vault as to store the Cosmos DB Account and Access Key that can be used to read the actual configuration. Key Vault is Regionally available by default so we're good there too.
Configure an Azure AD Service Principal with access to Key Vault to allow our solution to connect to Key Vault.

Update July 2018: Microsoft has released Managed Service Identities as a way to do delegated permissions between resources in Azure - I would strongly advise wrapping your head around this and leveraging MSI as the way to connect your VMSS instances to Key Vault (and potentially other resources).

I used a conventions-based approach to configuration, so that the whole process works based on the VMSS instance name and the service type requesting configuration. You can see this in the code that is based on the URL being used to access Key Vault and the Cosmos DB document ID that uses the same approach.

The resulting changes to my Web API code (based on the earlier web.config sample) are shown in this Gist. This all occurs at application startup time.

I have also defined a default Application Insights account into which any instance can log should it have problems (which includes not being able to read its expected Application Insights key). This is important as it allows us to troubleshoot issues without needing to get access to the VMSS instances.

Here's how we authorise our calls to Key Vault to retrieve our initial configuration Secrets (called on line 51 of the sample code on the Gist).

KeyVaultUtils.cs

namespace MyDemo.Website.WebApi.Helpers
{
    public static class KevVaultUtils
    {
        // You could also utilise a certificate-based service principal as well
        // Yes, these are hardcoded :)
        private const string kvSP = "xxxxxxxx-xxxx-xxxx-xxxx-xxxxxxxxxxxx";
        private const string kvSPkey = "NOTREALLYTHEPASSWORD";

        public static async Task<string> GetToken(string authority, string resource, string scope)
        {
            var authContext = new AuthenticationContext(authority);
            var clientCred = new ClientCredential(kvSP, kvSPkey);
            var result = await authContext.AcquireTokenAsync(resource, clientCred);

            if (result == null)
                throw new InvalidOperationException("Failed to obtain the JWT token");

            return result.AccessToken;
        }
    }
}

My goal was to make configuration easily manageable across multiple VMSS instances which requires some knowledge around how VMSS instance names are created.

The basic details are that they consist of a hostname prefix (based on what you input at VMSS creation time) that is appended with a base-36 (hexatrigesimal) value representing the actual instance. There's a great blog from Guy Bowerman from Microsoft that covers this in detail so I won't reproduce it here.

The final piece of the puzzle is the Cosmos DB configuration entry which I show below.

runtimeconfig.json

{
    "id": "swtst01-webapi",
    "AccessKey": "MATCHES_KEY_OF_CONFIG DB",
    "AppInsightsKey": "yyyyyyyy-yyyy-yyyy-yyyy-yyyyyyyyyyyy",
    "LoggingDatabaseAccount": "https://myruntimedb.documents.azure.com:443/",
    "LoggingDatabaseKey": "COSMOS_DB_KEY_OF_DBACCOUNT",
    "LoggingDatabase": "runtimeitems",
    "LoggingDatabaseCollection": "runtimedatabases",
}

The 'id' field maps to the VMSS instance prefix that is determined at runtime based on the name you used when creating the VMSS. We strip the trailing 6 characters to remove the unique component of each VMSS instance hostname.

The outcome of the three components (code changes, Key Vault and Cosmos DB) is that I can quickly add or remove VMSS groups in configuration, change where their configuration data is stored by updating the Key Vault Secrets, and even update running VMSS instances by changing the configuration settings and then forcing a restart on the VMSS instances, causing them to re-read configuration.

Is the above the only or best way to do this? Absolutely not :)

I'd like to think it's a good way that might inspire you to build something similar or better :)

Interestingly, getting to this stage as well, I've also realised there might be some value in considering moving this solution to Service Fabric in future, though I am more inclined to shift to Containers running under the control an orchestrator like Kubernetes.

What are you thoughts?

Until the next post!