Last week I was looking through my timeline and came across this tweet from Troy Hunt asking about how the autoscale features in Azure worked.
(click on the date of the above tweet to view the entire conversation I had with Troy).
In and of itself this question doesn’t immediately seem unusual. Then you think about it a bit. How long does it take?
Well the bottom line is actually a bit trickier than you would think (watch the great video on Channel9 for more details on how Azure does it).
This is why I am proposing a new metric for use in cloud autoscale.
This metric can be defined as:
The time between when the cloud fabric determines that a scale-out is required and the time that the scale-out instance is serving requests.
Many people confuse the time at which a scale event is fired (i.e. 5 minutes in Azure Websites case) with the time at which a new instance is actually serving traffic (5 minutes + N).
On my current cloud IaaS engagement we’re looking at how new applications can leverage autoscale correctly and have determined that:
- Prefer custom machine images over vanilla images with startup customisations (i.e. via cloud-init or similar constructs).
- Prefer startup customisations over deployment via orchestration services such as Puppet or Chef.
or, another way:
- Simple is better – if you have too many dependencies on startup your TTSO will be substantial.
Autoscale is not magic (and won’t save you)
I think we sometimes take for granted the elastic capabilities of cloud computing and assume that putting a few autoscale parameters in should see off unexpected peaks in demand.
The truth is that the majority of applications we will put in the cloud will never have a need for automated autoscale (i.e. based on CPU utilisation or other metrics), but I can guarantee that many would benefit from manual scale or scheduled autoscale (end of week, month, financial year anyone?)
Another important point to take into consideration is the performance of your cloud provider’s fabric when scaling. You might be in a part of their datacentre with a lot of “noisy neighbours” which might make scale events take longer – you have little or no control over this.
Will you meet your SLA?
Ultimately you will need to test any application you intend to use autoscale with to ensure that your SLAs will be met in the event that you autoscale. Can your minimum capacity cope with the load during your TTSO? Can you make your number of scale units larger to avoid multiple scale events?
Make sure you bake startup telemetry into your instances so you can measure your TTSO and work on refining it over time (and ensuring that each new generation of your machine image doesn’t negatively impact your TTSO).