Category Archives: Architecture

Gone in 60 Seconds – Cloud Load Balancing Timeouts

Update: July 2014: AWS announced that customers can now manage ELB timeouts.  Read more here: https://aws.amazon.com/blogs/aws/elb-idle-timeout-control/

After what seems like ages we finally managed to get to the bottom of the weird issues we were having with a load balanced instance of Umbraco hosted at Amazon Web Services.  Despite having followed Umbraco’s load balancing guide and extensively tested we were seeing some weird behaviours for multi-page publishing and node copy operations.  Multiple threads were created for each request which lead to unexpected outcomes for users.

Eventually we managed to isolate this to the default (and unchangeable at time of writing) 60 second timeout behaviour of AWS’ ELB infrastructure combined with the long-running nature of the POST requests from Umbraco.  The easiest way to see the real side effect of the 60 second timeout is to put a debugging proxy like Fiddler or Charles between your browser and the ELB.  What we saw is below.

gsixty

So, you can see the culprit right there in the red square – the call to the publish.aspx page is terminated at 60 seconds by the ELB which causes the browser to resubmit it – Ouch!  This also occurs when you copy or move nodes and the process exceeds 60 seconds – you get multiple nodes!

To be clear – this is not a problem that is isolated to Umbraco – there is a lot of software that relies on long-running HTTP POST operations with the expectation that they will run to completion.

Now there are probably a range of reasons why AWS has this restriction – the forum posts (dating back to 2009) don’t enlighten but it’s not hard to see why, in an “elastic” environment anything that takes a long time to complete may be a bad thing (you can’t “scale up” your ELB if it’s still processing a batch of long-running requests).  I can see the logic to this restriction – it simplifies the problems the AWS engineers need to solve, but it does introduce a limitation that isn’t covered clearly enough in any official AWS documentation.

The real solution here has to come from better software design that takes into account this limitation of the infrastructure and makes use of patterns like Post-Redirect-Get to submit a short POST request to initiate the process on the server, redirect to another page and then utilise async calls from the browser to check on the status of the process.

Yes, I know, we could probably run our own instances with HA Proxy on, but why build more infrastructure to manage when what’s there is perfectly fit for purpose?

Updated – You Have An Alternative

10 September – I’ve been lucky enough to be attending the first AWS Achitecture course run by Amazon here in Sydney and the news on this front is interesting.  By default you get 60 seconds, *but* you can request (via your AWS Account Manager or Architect) that this timeout be increased up to 17 minutes maximum.  This is applied on a per-ELB basis so if you create more ELB instances you would need to make the same request to AWS.

My advice: fix your application before you ask for a non-standard ELB setup.

Update: July 2014: AWS announced that customers can now manage ELB timeouts.  Read more here: https://aws.amazon.com/blogs/aws/elb-idle-timeout-control/

Not Just For WS ELB

Now, chaps (and ladies), you also need to be aware that this issue will raise its head in Windows Azure as well but most likely after a longer duration.  A very obliquely written blog post on MSDN suggests it will be now be based on the duration AND the number of concurrent connections you have.

You have been warned!

Tagged , , ,

You Can See The Storm Early If You Watch The Clouds

Here we are at just past the half way mark of 2012 and it’s time to ask yourself “do I have any skin in the cloud game?”

Through 2011 and 2012 the relative strength of the Australian dollar has meant the cost of entry to the major cloud platforms has dropped significantly for Australian businesses.  KPMG is estimating that if 75% of Australian businesses moved to cloud services that this would have a positive influence on Australia’s GDP to the tune of $3.32 billion annually!

I will admit to having been a cloud sceptic in past – certainly of the Azure platform, but with the recent set of changes introduced by Microsoft I’d say that Azure is now mature enough that it can be considered a competitive option against Amazon Web Services for complex application build and host (certainly in the .Net space).

What’s Your Cloud Tier?

I will add some more criteria to my original question – are you engaged on a “Tier 1” public cloud platform?  I’d classify Tier 1 as any of the mature global players – Amazon Web Services, Microsoft Azure, Google App Engine are probably top three here (there are others but I’m not going to try to rate / rank the multitude available…)

Beyond this I’d classify a range of local “Tier 2” providers that don’t compete on the same global scale as Tier 1 but offer similar sorts of options.  For the most part these providers tend to be not much more than highly virtualised traditional hosting businesses where you aren’t actually that far away from the bare metal.

Private Clouds

If you’re doing something like a Virtual Private Cloud (VPC) on a Tier 1 cloud platform then OK, you’re in the game.  If you built your own “Private Cloud” (whatever that is) or you’re running virtual machines in a Tier 2 provider then I’m afraid you can go and take a shower and head home.

My point here is that if you’re needing to care about anything vaguely hardware related or you don’t have global reach as an option on your platform then you’re not truly in the cloud.

Architectural Change

Once upon a time we had to care about resource utilisation when we ran our applications in the shared context of a mainframe that had limited (and expensive) resources.  The commoditisation of compute resources over the last 30 years means we stopped caring about the cost of CPU, RAM, Disk and network resources (for the most part).  Virtualisation only added to this.

Also, we controlled entire platforms top-to-bottom so we could tune or tweak aspects of the platform to suit our demands.  Virtualisation took away some of this flexibility but if you’ve ever spent any time managing VMWare or other systems you’ll know that there’s an large number of tuning possibilities that make the virtualisation layer almost entirely transparent.

We have become lazy in our architectural practices.  We stopped needing to solve some challenges because we could assume them away based on tuning our resources. Guess what?  We don’t get that any more with the cloud.  We still get commoditised resources but we also share it with other tenants.  This means we do need to start solving these challenges again through better architectural design.

Examples of common cloud scenarios that we haven’t had to solve recently our own platforms include:

  • Bandwidth constrained shared LAN segments.
  • I/O constrained disk access.
  • Transient component failure.
  • Dynamic scale up / scale down.
  • Pay-per-use for LAN traffic, disk access and other components.

If you’re not changing your architectural practices to take the above into your designs then you can also pack the bags and head home.

Additionally, if you’re a vendor and you aren’t making your licensing work for dynamic scale up / scale down you also lose the right to a spot on the team (and I have spoken to some vendors who aren’t supporting the cloud because it will gut their licensing model – not that they said it in so many words!)

In Summary

Ultimately the point I am trying to make is that if you haven’t been actively engaged in looking how you can move to the cloud then you are already too late to gain any form of competitive advantage in moving to it.  You should immediately start looking at ways to utilise the cloud even if it is only via small-scale deployments that are not necessarily related to key parts of your business.

I know I haven’t touched here on the data privacy / jurisdiction issues that are obviously a big issue for most Australian businesses, but there are ways to work around those challenges in the way you design and build your solutions.   Also, it’s highly likely we will also see at least one Tier 1 cloud provider here in Australia  with a full offering prior to the end of 2013. You should be getting ready now.

Finally, despite my obvious advocacy for the cloud you should always be aware of shamen.
Love consultants!

Tagged , , , ,