Gone in 60 Seconds – Cloud Load Balancing Timeouts

After what seems like ages we finally managed to get to the bottom of the weird issues we were having with a load balanced instance of Umbraco hosted at Amazon Web Services.  Despite having followed Umbraco’s load balancing guide and extensively tested we were seeing some weird behaviours for multi-page publishing and node copy operations.  Multiple threads were created for each request which lead to unexpected outcomes for users.

Eventually we managed to isolate this to the default (and unchangeable at time of writing) 60 second timeout behaviour of AWS’ ELB infrastructure combined with the long-running nature of the POST requests from Umbraco.  The easiest way to see the real side effect of the 60 second timeout is to put a debugging proxy like Fiddler or Charles between your browser and the ELB.  What we saw is below.

gsixty

So, you can see the culprit right there in the red square – the call to the publish.aspx page is terminated at 60 seconds by the ELB which causes the browser to resubmit it – Ouch!  This also occurs when you copy or move nodes and the process exceeds 60 seconds – you get multiple nodes!

To be clear – this is not a problem that is isolated to Umbraco – there is a lot of software that relies on long-running HTTP POST operations with the expectation that they will run to completion.

Now there are probably a range of reasons why AWS has this restriction – the forum posts (dating back to 2009) don’t enlighten but it’s not hard to see why, in an “elastic” environment anything that takes a long time to complete may be a bad thing (you can’t “scale up” your ELB if it’s still processing a batch of long-running requests).  I can see the logic to this restriction – it simplifies the problems the AWS engineers need to solve, but it does introduce a limitation that isn’t covered clearly enough in any official AWS documentation.

The real solution here has to come from better software design that takes into account this limitation of the infrastructure and makes use of patterns like Post-Redirect-Get to submit a short POST request to initiate the process on the server, redirect to another page and then utilise async calls from the browser to check on the status of the process.

Yes, I know, we could probably run our own instances with HA Proxy on, but why build more infrastructure to manage when what’s there is perfectly fit for purpose?

Updated – You Have An Alternative

10 September – I’ve been lucky enough to be attending the first AWS Achitecture course run by Amazon here in Sydney and the news on this front is interesting.  By default you get 60 seconds, *but* you can request (via your AWS Account Manager or Architect) that this timeout be increased up to 17 minutes maximum.  This is applied on a per-ELB basis so if you create more ELB instances you would need to make the same request to AWS.

My advice: fix your application before you ask for a non-standard ELB setup.

Not Just For AWS ELB

Now, chaps (and ladies), you also need to be aware that this issue will raise its head in Windows Azure as well but most likely after a longer duration.  A very obliquely written blog post on MSDN suggests it will be now be based on the duration AND the number of concurrent connections you have.

You have been warned!

Bouquets and Brickbats for Amazon’s AWS

This week on April 3 we were lucky enough to have the AWS Lean Cloud bandwagon roll into town in Sydney.  We’re a heavy user and believer in Cloud (from any vendor) at TheFARM so our tech team physically went along to the event (yeah, I know, old hat right – should have watched it streamed online).

Amazon’s CTO Werner Vogels got up in his Chaos Monkey t-shirt and talked at length about AWS as a platform and how over the last three years there has been an explosion in its usage.  While recycling content from earlier presentations in 2011 Werner showed real passion about the mission he and his team are on.  He said he welcomed feedback of any type – especially from those who aren’t using the AWS platform – he wanted to know why, what was missing, what made those people choose not to venture into AWS.

Now this doesn’t apply to us – we’re already there as are some of our customers.  But, as Werner asked for it here’s some constructive feedback based on our experiences.

Documentation Accuracy

As the slides showed, through 2009, 2010 and 2011 Amazon added a lot of features to AWS.  That’s neat.  However, we often struggle to find the authoritative piece of information about a service or feature.  There’s always those forum posts that rank higher in search results than the actual documentation.  And the posts often provide accurate and insightful information that doesn’t always align with the documentation.  I think some of this is due to the rate of change – new or changed stuff comes online so often that the documentation updates lag behind.  I’d say that slowing down new service introduction to ensure that each new offering is properly supported at a documentation level would be an idea.  More-so given the volume of new(ish) AWS consumers versus (what I assume) is a relatively small support team.

No Public Roadmap

Will they, won’t they?  Yes, I know it’s commercially sensitive but how can I plan properly if I don’t know what’s coming?  And what’s with the “it’s just around the corner” posts that occur from time-to-time in response to questions in the forums?  I can’t plan with “just around the corner”.  I know being an SI partner would open up some of this information but if I’m not (or my customer isn’t) a partner how do we find that out?  Werner talked about how the platform is a great leveller and enabler for smaller businesses and start-ups but it does seem that only partners or larger customers are privy to important forward-looking information.  It shouldn’t be that way.

BTW, when will there be RDS for SQL Server?  Will there ever be a Region in Australia?

No Cookie Cutter

I know we have the AppHarbors and the Herokus of the world that make it easier for me to host than to go to raw AWS or Azure, but if Amazon wants to hit a sweet spot for the smaller end of the market then having a model that is similar what those guys are offering would be cool.  I understand the differences and the reasons why they don’t offer it but I think they should.  Most people don’t want to care about AMIs, EBS volumes, S3 storage buckets, etc.  They want to host their website in the cheapest configuration they can so it is highly available.  Sure, I can publish my static website directly from S3, but I still need to know what S3 is!  As an example I might want to migrate my small business website to AWS.  I know what technology it is built on and I have the source code – I should be able to check a few boxes on a web form that align with my knowledge and Amazon should provision the least cost option to host it (I can choose high availability if I want as an extra cost).

Updates:

So I had a couple of more items I thought should feed in here too.

Usage Reconciliation

If you’ve ever used the Amazon AWS portal to retrieve usage stats you’ll quickly come to discover how hard it is to use the data that it provides (at time of writing).  You quickly discover that trying to work out what cost you actually incurred per usage item is nigh on impossible.  You almost have to take the bill you receive at face value.  I particularly found this annoying whenyou consider the excellent monthly usage calculator that is provided to work out costs in advance!

Delegated Authorisation

Man, this area needs some work!  One of our customers granted us access to various items within their account.  Think it meant we could do anything useful? Nope.  401 Denied. We are locked out of so much functionality it’s not funny.  Now this might be due to lack of understanding of how IAM and the permission system work, but I’d argue if it’s that hard for experienced IT professionals to get right then there is something fundamentally wrong in the design.

So that’s some thoughts and feedback for Werner!  The event was good and really showed the passion Amazon has for its offering (and how much local interest there is!)  The ability to hear from some “thought leaders” in the Lean space such as Eric Ries was also excellent.

Ultimately I think the clear message is that if you haven’t started a journey to the Cloud (whatever flavour) it’s time you should.

Using Amazon SES for .Net Application Mail Delivery

Until March 2012 Amazon’s Simple Email Service (SES) had limited support for mail being sent via existing .Net code and the IIS SMTP virtual server.  Some recent changes mean this is now possible so in this post I’ll quickly cover how you can configure your existing apps to utilise SES.

If you don’t understand why you should be using SES for your applications then you should be looking at the Amazon SES FAQ and before you start any of this configuration you need to ensure that you have created your SMTP credentials on the AWS console and that you have an appropriately validated sender address (or addresses).  Amazon is really strict here as they don’t want to get blacklisted for being a spammer host.

IIS Virtual SMTP Server

Firstly, let’s look at how we can setup the SMTP server as a smart host that forwards mail on to SES for you.  This approach means that you can configure all your applications to forward via IIS rather than talking directly to the SES SMTP interface.

1. Open up the IIS 6 Manager and select the SMTP Virtual Server you want to configure.

1.iis_virtual_smtp

2. Right-click on the server and select Properties.

3. In the Properties Window click on the Delivery tab.

4. On the Delivery tab click on the Outbound Security button on the bottom right.

5. In the Outbound Security dialog select “Basic Authentication” and enter your AWS SES Credentials.  Make sure you check the “TLS Encryption” box at the bottom left of the dialog.  Click OK. Your screen should look similar to this:

2.delivery_setup

6. Now open the Advanced Delivery dialog by clicking on the button.

7. Modify the dialog so it looks similar to the below.  I put the internal DNS name for my host here – the problem with this is that if you shut off your Instance the name will change and you need to update this.  Click OK.

3.advanced_delivery

Now you should be ready to use this IIS SMTP Virtual Server as a relay for you applications to SES.  Make sure you set AWS SecurityGroups up correctly and that you are restricting which hosts can relay via your SMTP server.

Directly from .Net Code

Without utilising the Amazon AWS SDK for .Net you can also continue to send mail the way you always have – you will need to make the following change to your App.config or Web.config file.

<mailSettings>
      <smtp deliveryMethod="Network" from="validated@yourdomain.com">
          <network defaultCredentials="false"
                   enableSsl="true"
                   host="email-smtp.us-east-1.amazonaws.com"
                   port="25"
                   userName="xxxxxxxx"
                   password="xxxxxxxx" />
      </smtp>
</mailSettings>

Thanks to the author of the March 11, 2012 post on this thread on the AWS support forums for the configuration file edits above.

With these two changes most “legacy” .Net applications should be able to utilise Amazon’s SES service for mail delivery.  Happy e-mailing!

Amazon AWS Elastic Load Balancers and MSBuild – BFF

Our jobs take us to some interesting places sometimes – me, well, recently I’ve spent a fair amount of time stuck in the land of Amazon’s cloud offering AWS.

Right now I’m working on a release process based around MSBuild that can deploy to a farm of web servers at AWS.  As with any large online offering ours makes use of load balancing to provide a reliable and responsive service across a set of web servers.

Anyone who has managed deployment of a solution in this scenario is most likely familiar with this approach:

  1. Remove target web host from load balancing pool.
  2. Update content on the web host and test.
  3. Return web host to load balancing pool.
  4. Repeat for all web hosts.
  5. (Profit! No?!)

Fantastic Elastic and the SDK

In AWS-land load balancing is provided by the Elastic Load Balancing (ELB) service which, like many of the components that make up AWS, provides a nice programmatic API in a range of languages.

Being focused primarily on .Net we are we are happy to see good support for it in the form of the AWS SDK for .NET.  The AWS SDK provides a series of client proxies and strongly-typed objects that can be used to programmatically interface with pretty much anything your AWS environment is running.

Rather than dive into the detail on the SDK I’d recommend downloading a copy and taking a look through the samples they have – note that you will need an AWS environment in order to actually test out code but this doesn’t stop you from reviewing the code samples.

Build and Depoy

As mentioned above we are looking to do minimal manual intervention deployments and are leveraging MSBuild to build, package and deploy our solution.  One item that is missing in this process is a way to take a target machine out of the load balancer pool so we can deploy to it.

I spent some time reviewing existing custom MSBuild task libraries that provide AWS support but it looks like many of them are out-of-date and haven’t been touched since early 2011.  AWS is constantly changing so being able to keep up with all it has to offer would probably require some effort!

The result is that I decided to create a few custom tasks so that I could use for registration / deregistration of EC2 Instances from one or more ELBs.

I’ve included a sample of a basic RegisterInstances custom task below to show you how you go about utilising the AWS SDK to register an Instance with an ELB.   Note that the code below works but that it’s not overly robust.

The things you need to know for this to work are:

  1. Your AWS Security Credentials (Access and Secret Keys).
  2. The names of the ELBs you want to register / deregister instances with.
  3. The names of the EC2 Instances to register / deregister.
using System;
using System.Collections.Generic;
using System.Linq;
using System.Text;
using Microsoft.Build.Utilities;
using Microsoft.Build.Framework;
using Amazon.ElasticLoadBalancing;
using Amazon.ElasticLoadBalancing.Model;

namespace TheFarm.MsBuild.CustomTasks.Aws.Elb
{
    /// <summary>
    /// Register one or more EC2 Instance with one or more AWS Elastic Load Balancer.
    /// </summary>
    /// <remarks>Requires the AWS .Net SDK.</remarks>
    public class RegisterInstances : Task
    {
         /// <summary>
         /// Gets or sets the load balancer names.
         /// </summary>
         /// <value>
         /// The load balancer names.
         /// </value>
         /// <remarks>The account associated with the AWS Access Key must have created the load balancer(s).</remarks>
         [Required]
         public ITaskItem[] LoadBalancerNames { get; set; }

         /// <summary>
         /// Gets or sets the instance identifiers.
         /// </summary>
         /// <value>
         /// The instance identifiers.
         /// </value>
         [Required]
         public ITaskItem[] InstanceIdentifiers { get; set; }

         /// <summary>
         /// Gets or sets the AWS access key.
         /// </summary>
         /// <value>
         /// The aws access key.
         /// </value>
         [Required]
         public ITaskItem AwsAccessKey { get; set; }

         /// <summary>
         /// Gets or sets the AWS secret key.
         /// </summary>
         /// <value>
         /// The aws secret key.
         /// </value>
         [Required]
         public ITaskItem AwsSecretKey { get; set; }

         /// <summary>
         /// Gets or sets the Elastic Load Balancing service URL.
         /// </summary>
         /// <value>
         /// The ELB service URL.
         /// </value>
         /// <remarks>Will typically take the form: https://elasticloadbalancing.region.amazonaws.com</remarks>
         [Required]
         public ITaskItem ElbServiceUrl { get; set; }

         /// <summary>
         /// When overridden in a derived class, executes the task.
         /// </summary>
         /// <returns>
         /// true if the task successfully executed; otherwise, false.
         /// </returns>
         public override bool Execute()
         {
             try
             {
                  // throw away - to test for valid URI.
                  new Uri(ElbServiceUrl.ItemSpec);

                  var config = new AmazonElasticLoadBalancingConfig { ServiceURL = ElbServiceUrl.ItemSpec };

                  using (var elbClient = new AmazonElasticLoadBalancingClient(AwsAccessKey.ItemSpec, AwsSecretKey.ItemSpec, config))
                  {
                        foreach (var loadBalancer in LoadBalancerNames)
                        {
                              Log.LogMessage(MessageImportance.Normal, "Preparing to add Instances to Load Balancer with name '{0}'.", loadBalancer.ItemSpec);

                              var initialInstanceCount = DetermineInstanceCount(elbClient, loadBalancer);

                              var instances = PrepareInstances();

                              var registerResponse = RegisterInstancesWithLoadBalancer(elbClient, loadBalancer, instances);

                              ValidateInstanceRegistration(initialInstanceCount, instances, registerResponse);

                              DetermineInstanceCount(elbClient, loadBalancer);
                        }
                  }
             }
             catch (InvalidInstanceException iie)
             {
                   Log.LogError("One or more supplied instances was invalid.", iie);
             }
             catch (LoadBalancerNotFoundException lbe)
             {
                   Log.LogError("The supplied Load Balancer could not be found.", lbe);
             }
             catch (UriFormatException)
             {
                   Log.LogError("The supplied ELB service URL is not a valid URI. Please confirm that it is in the format 'scheme://aws.host.name'");
             }

             return !Log.HasLoggedErrors;
        }

        /// <summary>
        /// Prepares the instances.
        /// </summary>
        /// <returns>List of Instance objects.</returns>
        private List<Instance> PrepareInstances()
        {
            var instances = new List<Instance>();

            foreach (var instance in InstanceIdentifiers)
            {
                Log.LogMessage(MessageImportance.Normal, "Adding Instance '{0}' to list.", instance.ItemSpec);

                instances.Add(new Instance { InstanceId = instance.ItemSpec });
            }
            return instances;
        }

        /// <summary>
        /// Registers the instances with load balancer.
        /// </summary>
        /// <param name="elbClient">The elb client.</param>
        /// <param name="loadBalancer">The load balancer.</param>
        /// <param name="instances">The instances.</param>
        /// <returns>RegisterInstancesWithLoadBalancerResponse containing response from AWS ELB.</returns>
        private RegisterInstancesWithLoadBalancerResponse RegisterInstancesWithLoadBalancer(AmazonElasticLoadBalancingClient elbClient, ITaskItem loadBalancer, List<Instance> instances)
        {
            var registerRequest = new RegisterInstancesWithLoadBalancerRequest { Instances = instances, LoadBalancerName = loadBalancer.ItemSpec };

            Log.LogMessage(MessageImportance.Normal, "Executing call to add {0} Instances to Load Balancer '{1}'.", instances.Count, loadBalancer.ItemSpec);

            return elbClient.RegisterInstancesWithLoadBalancer(registerRequest);
        }

        /// <summary>
        /// Validates the instance registration.
        /// </summary>
        /// <param name="initialInstanceCount">The initial instance count.</param>
        /// <param name="instances">The instances.</param>
        /// <param name="registerResponse">The register response.</param>
        private void ValidateInstanceRegistration(int initialInstanceCount, List<Instance> instances, RegisterInstancesWithLoadBalancerResponse registerResponse)
        {
            var postInstanceCount = registerResponse.RegisterInstancesWithLoadBalancerResult.Instances.Count();

            if (postInstanceCount != initialInstanceCount + instances.Count)
            {
                 Log.LogWarning("At least one Instance failed to register with the Load Balancer.");
            }
        }

        /// <summary>
        /// Determines the instance count.
        /// </summary>
        /// <param name="elbClient">The elb client.</param>
        /// <param name="loadBalancer">The load balancer.</param>
        /// <returns>integer containing the instance count.</returns>
        private int DetermineInstanceCount(AmazonElasticLoadBalancingClient elbClient, ITaskItem loadBalancer)
        {
             var response = elbClient.DescribeLoadBalancers(new DescribeLoadBalancersRequest { LoadBalancerNames = new List<string> { loadBalancer.ItemSpec } });

             var initialInstanceCount = response.DescribeLoadBalancersResult.LoadBalancerDescriptions[0].Instances.Count();

             Log.LogMessage(MessageImportance.Normal, "Load Balancer with name '{0}' reports {1} registered Instances.", loadBalancer.ItemSpec, initialInstanceCount);

             return initialInstanceCount;
        }
    }
}

So now we have a task compiled into an assembly we can reference that assembly in our build script and invoke the task using the following syntax:


    <RegisterInstances LoadBalancerNames="LoadBalancerName1" InstanceIdentifiers="i-SomeInstance1" AwsAccessKey="YourAccessKey" AwsSecretKey="YourSecretKey" ElbServiceUrl="https://elasticloadbalancing.your-region.amazonaws.com" />

That’s pretty much it… there are some vagaries to be aware of – the service call for deregistration of an Instance returns prior to the instance being fully de-registered with the load balancer so don’t key any deployment action directly off of that return – you should perform other checks first to make sure that the instance *is* no longer registered prior to deploying.

I hope you’ve found this post useful in showing you what is possible when combining the AWS SDK with the extensibility of MSBuild.