Quick Links To Help You Learn About Developing For The Cloud

Unsurprisingly I think the way Cloud Computing is transforming the IT industry is also leading to easier ways to learn and develop skills about the Cloud. In this post I’m going to give a run down on what I think some of the best ways are to start dipping your toe into this space if you haven’t already.

Sign up for a free trial

This is easy AND low cost. Turn up to the sign-up page for most major players and you’ll get free or low-cost services for a timed period. Sure, you couldn’t start the next Facebook at this level, but it will give you enough to start to learn what’s on offer.  You can run VMs, deploy solutions, utilise IaaS, PaaS and SaaS offerings and generally kick the tyres of the features of each. At time of writing these are:

Learn the APIs and use the SDKs

Each of Amazon, Azure, Google, Office 365 and Rackspace offer some form of remote programmable API (typically presented as REST endpoints).  If you’re going to move into Cloud from traditional hosting or system development practices then starting to learn about programmable infrastructure is a must.  Understanding the APIs available will depend on leveraging existing documentation:

If you aren’t a fan of working so close to the wire you can always leverage one of the associated SDKs in the language of your choice:

The great thing about having .Net support is you can then leverage those SDKs directly in PowerShell and automate a lot of items via scripting.

Developer Tool Support

While having an SDK is fine there’s also a need to support developers within whatever IDE they happen to be using.  Luckily you get support here too:

Source Control and Release Management

The final piece of the puzzle and one not necessarily tied to the individual Cloud providers is where to put your source code and how to deploy it.

  • Amazon Web Services: You can leverage Elastic Beanstalk for deployment purposes (this is a part of the Visual Studio and Eclipse toolkits). http://aws.amazon.com/elasticbeanstalk/
  • Google App Engine: Depending on language you have a few options for auto-deploying applications using command-line tools from build scripts.  Eclipse tooling (covered above) also provides deployment capabilities.
  • Rackspace Cloud: no publicly available information on build and deploy.
  • Windows Azure: You can leverage deployment capabilities out of Visual Studio (probably not the best solution though) or utilise the in-built Azure platform support to deploy from a range of hosted source control providers such as BitBucket (Git or Mercurial), Codeplex, Dropbox (yes, I know), GitHub or TFS.  A really strong showing here from the Azure platform! http://www.windowsazure.com/en-us/develop/net/common-tasks/publishing-with-git/

So, there we have it – probably one of the most link-heavy posts you’ll ever come across – hopefully the links will stay valid for a while yet!  If you spot anything that’s dead or that is just plain wrong leave me a comment.

HTH.

SharePoint Online 2013 ALM Practices

SharePoint has always been a bit a challenge when it comes to structured ALM and developer practices which is something Microsoft partially addressed with the release of SharePoint and Visual Studio 2010. Deploying and building solutions for SharePoint 2013 pretty much retains most of the IP from 2010 with the noted deprecation of Sandbox Solutions (this means they’ll be gone in SharePoint vNext).

As part of the project I’m leading at Kloud at the moment we are rebuilding an Intranet so it runs on SharePoint Online 2013 so I wanted to share some of the Application Lifecycle Management (ALM) processes we’ve been using.

Packaging

Most of the work we have been doing to date has leveraged existing features within the SharePoint core – we have, however, spent time utilising the Visual Studio 2012 SharePoint templates to package our customisations so they can be moved between multiple environments. SharePoint Online still provides support for Sandboxed Solutions and we’ve found that they provide a convenient way to deploy elements that are not developed as Apps. Designer packages can also be exported and edited in Visual Studio and produce a re-deployable package (which result in Sandboxed Solutions).

Powershell

At the time of writing, the number of Powershell Commandlets for managing SharePoint Online are substantially less those for on-premise. If you need to modify any element below a Site Collection you are pretty much forced to write custom tooling or perform the tasks manually – we have made a call in come cases to build tooling using the Client Side Object Model (CSOM) or to perform tasks manually.

Development Environment

Microsoft has invested some time in the developer experience around SharePoint Online and now provides you with free access to an “Office 365 Developer Site” which gives you a single-license Office 365 environment in which to develop solutions. The General Availability of Office 365 Wave 15 (the 2013 suite) sees these sites only being available for businesses holding enterprise (E3 or E4) licenses.  Anyone else will need to utilise a 30 day trial tenant.

We have had each team member setup their own site and develop solutions locally prior to rolling them into our main deployment. Packaging and deployment is obviously key here as we need to be able to keep the developer instances in sync with each other and the easiest way to achieve that is with WSPs that can be redeployed as required.

One other item we have done around development is to utilise an on-premise setup in a VM to provide developers with a more rapid development experience in some cases (and more transparent troubleshooting). As you mostly stick to the SharePoint CSOM a lot of your development these days resides in JavaScript which means you shouldn’t hit any snags in relying in on-premise / full-trust features in your delivered solutions.

Note that the Office 365 Developer Site is a single-license environment which means you can’t do multi-user testing or content targeting. That’s where test environments come into play!

Test Environment

The best way to achieve a more structured ALM approach with Office 365 is to leverage an intermediate test environment – the easiest way for anyone to achieve this is to register for a trial Office 365 tenant – while only technically available for 30 days this still provides you with the ability to test prior to deploying to your production environment.

Once everything is tested and good to go into production you’re already in a position to know the steps involved in deployment!

As you can see – it’s still not a perfect world for SharePoint ALM, but with a little work you can get to a point where you are at least starting to enforce a little rigour around build and deployment.

Hope this helps!

Useful Links

Call Web Services From Workflows on SharePoint Online

The increasing adoption of Office 365 is driving a lot of traditional development on the SharePoint platform online. As you might expect there are some big differences between on-premise and cloud and the ways in which you achieve customisation and implementation of features.

Traditionally timer jobs played a large part in the way background services could be implemented in SharePoint. You will find that timer jobs are absent in SharePoint Online and that the alternative is to leverage the workflow capabilities of SharePoint to achieve the same sort of outcome.

A fairly typical scenario for timed jobs is to poll external services for some form of information to be cached locally on SharePoint. The good news is that the standard Call HTTP Web Service Action of SharePoint 2013 workflows execute the same in Office 365 as they do on-premise.

There is a blog post and demo on MSDN that you can use to test this out for yourself.

Gotcha: and a fairly big (and unobvious) one: this workflow Action can only handle calls to webservices that return responses of type text/html, text/plain and application/json. You will find you are unable to accept and process text/xml responses and the Action will only pipe the response from a webservice call to a Dictionary object so you can’t even do any string manipulation foo on the result if it’s not one of the three response types accepted!

Hope this post saves you some time!

Save Bytes, Your Sanity and Money

In this day of elastic on-demand compute resource it can be easy to lose focus on how best to leverage a smaller footprint when it’s so easy to add capacity. Having spent many a year working on the web it’s interesting to see how development frameworks and web infrastructure has matured to better support developers in delivering scalable solutions for not much effort. Still, it goes without saying that older applications don’t easily benefit from more modern tooling and even newer solutions sometimes fail to leverage tools because the solution architects and developers just don’t know about them. In this blog post I’ll try to cover off some and provide background as to why it’s important.

Peak hour traffic

We’ve all driven on roads during peak hour – what a nightmare! A short trip can take substantially longer when the traffic is heavy. Processes like paying for tolls or going through traffic lights suddenly start to take exponentially longer which has a knock-on effect to each individual joining the road (and so on). I’m pretty sure you can see the analogy here with the peaks in demand that websites often have, but, unlike on the road the web has this problem two-fold because your request generates a response that has to return to your client (and suffer a similar fate).

At a very high level the keys to better performance on the web are:

  • ensure your web infrastructure takes the least amount of time to handle a request
  • make sure your responses are streamlined to be as small as possible
  • avoid forcing clients to make multiple round-trips to your infrastructure.

All requests (and responses) are not equal

This is subtle and not immediately obvious if you haven’t seen how hosts with different latencies can affect your website. You may have built a very capable platform to service a high volume of requests but you may not have considered the time it takes for those requests to be serviced.

What do I mean?

A practical example is probably best and is something you can visualise yourself using your favourite web browser. In Internet Explorer or Chrome open the developer tools by hitting F12 on your keyboard (in IE make sure to hit “Start Capturing” too) – if you’re using Firefox, Safari, et al… I’m sure you can figure it out ;-) . Once open visit a website you know well and watch the list of resources that are loaded. Here I’m hitting Google’s Australia homepage.

I’m on a very low latency cable connection so I have a response in the milliseconds.

Network view in Internet Explorer

Network view in Google Chrome

This means that despite the Google homepage sending me almost 100 KB of data it serviced my entire request in under half a second (I also got some pre-cached goodness thrown in which also makes the response quicker). The real interest beyond this is what is that time actually made up of? Let Chrome explain:

Request detail from Google Chrome

My client (browser) spent 5ms setting up the connection, 1ms sending my request (GET http://www.google.com.au/), 197ms waiting for Google to respond at all, and then 40ms receiving the response. If this was a secure connection there would be more setup as my client and the server do all the necessary encryption / decryption to secure the message transport.

As you can imagine, if I was on a high latency connection each one of these values could be substantially higher. The net result on Google’s infrastructure would be:

  • It takes longer to receive the full request from my client after connection initialisation
  • It takes longer to stream the full response from their infrastructure to my client.

Both of which means my slower connection would use Google’s server resources for longer thus stopping those resources servicing another request.

As you can see this effectively limits the infrastructure to run at lower capacity than it really could and also demonstrates why performing load testing requires that you run test agents that utilise different latencies so you can gauge realistically what your capacity is.

Some things you can do

Given you have no control over how or where the requests will come from there are a few things you can do to help reduce the effect of low latency clients will impact your site.

  1. Reduce the number of requests or round trips: often overlooked but is increasingly becoming easier to achieve. The ways you can achieve a reduction in requests include:
    1. Use a CDN for resources: Microsoft and Google both host jQuery (and various jQuery plugins) on their CDNs. You can leverage these today with minimal effort. Avoid issues with SSL requests by mapping the CDN using a src attribute similar to “//ajax.googleapis.com/ajax/libs/jquery/1.6.1/jquery.min.js” (without the http: prefix). Beyond jQuery push static images, CSS and other assets to utilise a CDN (regardless of provider) – cost should be no big issue for most scenarios.
    2. Bundle scripts: most modern sites make heavy use of JavaScript and depending on how you build your site you may have many separate source files. True, they may only be a few KB but each request a client makes will need to go through a process similar to the above. Bundling refers to the combining of multiple JavaScript files into a single download. Bundling is now natively supported in ASP.Net 4.5 and is available in earlier versions through third-party tooling for either runtime or at-build bundling. Other platforms and technologies offer similar features.
    3. Use CSS Sprites: many moons ago each individual image reference in CSS would be loaded as an individual asset onto your server. While you can still do this the obvious net effect is the need to request multiple assets from the server. CSS sprites combine multiple images into one image and then utilise offsets in CSS to show the right section of the sprite. The upside is also client-side caching means any image reference in that sprite will be serviced very quickly.
    4. Consider inline content: there I said it. Maybe include small snippets of CSS or JavaScript in the page itself. If it’s the only place it’s used why push it to another file and generate a second request for this page? Feeling brave? You could leverage the Data URI scheme for image or other binary data and have that inline too.
  2. Reduce the size of the resources you are serving using these approaches:
    1. Minification: make sure you minify your CSS and JavaScript. Most modern web frameworks will support this natively or via third-party tooling. It’s surprising how many people overlook this step and on top of that also don’t utilise the minified version of jQuery!
    2. Compress imagery: yes, yes, sounds like the pre-2000 web. Know what? It hasn’t changed. This does become increasingly difficult when you have user generated content (UGC) but even there you can provide server-side compression and resizing to avoid serving multi-MB pages!
    3. Use GZIP compression: there is a trade-off here – under load can your server cope with the compression demands? Does the web server you’re using support GZIP of dynamic content? This change, while typically an easy one (it’s on or off on the server) requires testing to ensure other parts of your infrastructure will support it properly.
  3. Ensure you service requests as quickly as possible – this is typically where most web developers have experience and where a lot of time is spent tuning resources such as databases and SANs to ensure that calls are as responsive as possible. This is a big topic all on its own so I’m not going to dive into it here!

If you’re a bit lost were to start it can pay to use tools like YSlow from Yahoo! Or PageSpeed from Google – these will give you clear guidance on areas to start working on. From there it’s a matter of determining if you need to make code or infrastructure changes (or both) to create a site that can scale to more traffic without needing to necessarily obtain more compute power.

Hope you’ve found this useful – if you have any tips, suggestions or corrections feel free to leave them in the comments below.

Clean Backups Using Windows Server Backup and EBS Snapshots

One of the powerful features of AWS is the ability to snapshot any EBS volume you have operating. The challenge when utilising Windows 2008 R2 is that NTFS doesn’t support point-in-time snapshotting which can result in an inconsistent EBS snapshot. You can still create a snapshot but it may be that one or more file that was mid-change may not be accurate (or usable) in the resulting snapshot.

How to solve

The answer, it turns out, is fairly easy (if a little convoluted) – Windows Server Backup PLUS EBS snapshots.

In order for this to work you will need:

1. Windows Server Backup installed on your Instance (this does not require a reboot on install).

2. An appropriately sized EBS volume to support your intended backup (see the Technet arcticle on how to work out your allowance – remember you can always create a new volume later if you get the size wrong!)

3. Windows Task Scheduler (who needs Quartz or cron ;) ).

4. Some Powershell Foo.

Volume Shadow Copy using the new EBS Volume

Once you have created your new EBS volume and attached it to the Instance you want to backup, open up Windows Server Backup and select what you want to backup (full, partial, custom – to be honest you shouldn’t need to do a “bare metal” backup….)  Note that Windows Server Backup will delete *everything* on the target EBS volume and it will no longer show up in Windows Explorer – it will still be attached and show up in Disk Manager, but you can’t copy files to or from it.

The benefit of Windows Server Backup is that it will utilise Windows Volume Shadow Copy Service (VSS)  resulting in a clean copy of files – even those in-flight at the time of backup.  This gets around the inability to “freeze” NTFS at runtime.

Set your schedule and save the backup.

Snapshot the backup volume

Now you have a nice clean static copy of the filesystem you can go ahead and create an EBS snapshot.  The best way I found to schedule this was to use some Powershell magic written by the guys over at messors.com.  I slightly modified the Daily Snapshot Powershell to snapshot a pre-defined Instance and Volume by calling the CreateSnapshotForInstance function that takes an Instance and Volume identifier.  The great thing with this script setup is it auto-manages the expiry of old snapshots so my S3 usage doesn’t just keep on growing.

I invoke this Powershell on the success of the initial Windows Server Backup scheduled task by using the Task Scheduling – see this post at Binary Royale on how to do this (they send an email – I use the same approach to do a snapshot instead).

The great thing about the Powershell is that it will send you an email once completed – I have also setup mail notifications at completion of the Windows Server Backup (success, failure, disk full) so I can know when / if the backup fails at an early stage.  Note that in Windows Server 2008 R2 you can send an email right in Windows Task Scheduler – on AWS you’ll need to have an IIS smart host up and running to support this though – see my earlier post on how to set up a IIS SES relay host.

Summary

The benefit of this approach is a consistent state for your backup contents as well as an easy way to do a controlled restore at a later date (“create volume from snapshot” in the AWS management console and then attach to an Instance).  Additionally you achieve a reasonably reliable backup for not much more than the cost of EBS I/O and S3 I/O and storage costs.

Hope you find this useful.

Gone in 60 Seconds – Cloud Load Balancing Timeouts

After what seems like ages we finally managed to get to the bottom of the weird issues we were having with a load balanced instance of Umbraco hosted at Amazon Web Services.  Despite having followed Umbraco’s load balancing guide and extensively tested we were seeing some weird behaviours for multi-page publishing and node copy operations.  Multiple threads were created for each request which lead to unexpected outcomes for users.

Eventually we managed to isolate this to the default (and unchangeable at time of writing) 60 second timeout behaviour of AWS’ ELB infrastructure combined with the long-running nature of the POST requests from Umbraco.  The easiest way to see the real side effect of the 60 second timeout is to put a debugging proxy like Fiddler or Charles between your browser and the ELB.  What we saw is below.

gsixty

So, you can see the culprit right there in the red square – the call to the publish.aspx page is terminated at 60 seconds by the ELB which causes the browser to resubmit it – Ouch!  This also occurs when you copy or move nodes and the process exceeds 60 seconds – you get multiple nodes!

To be clear – this is not a problem that is isolated to Umbraco – there is a lot of software that relies on long-running HTTP POST operations with the expectation that they will run to completion.

Now there are probably a range of reasons why AWS has this restriction – the forum posts (dating back to 2009) don’t enlighten but it’s not hard to see why, in an “elastic” environment anything that takes a long time to complete may be a bad thing (you can’t “scale up” your ELB if it’s still processing a batch of long-running requests).  I can see the logic to this restriction – it simplifies the problems the AWS engineers need to solve, but it does introduce a limitation that isn’t covered clearly enough in any official AWS documentation.

The real solution here has to come from better software design that takes into account this limitation of the infrastructure and makes use of patterns like Post-Redirect-Get to submit a short POST request to initiate the process on the server, redirect to another page and then utilise async calls from the browser to check on the status of the process.

Yes, I know, we could probably run our own instances with HA Proxy on, but why build more infrastructure to manage when what’s there is perfectly fit for purpose?

Updated – You Have An Alternative

10 September – I’ve been lucky enough to be attending the first AWS Achitecture course run by Amazon here in Sydney and the news on this front is interesting.  By default you get 60 seconds, *but* you can request (via your AWS Account Manager or Architect) that this timeout be increased up to 17 minutes maximum.  This is applied on a per-ELB basis so if you create more ELB instances you would need to make the same request to AWS.

My advice: fix your application before you ask for a non-standard ELB setup.

Not Just For AWS ELB

Now, chaps (and ladies), you also need to be aware that this issue will raise its head in Windows Azure as well but most likely after a longer duration.  A very obliquely written blog post on MSDN suggests it will be now be based on the duration AND the number of concurrent connections you have.

You have been warned!

You Can See The Storm Early If You Watch The Clouds

Here we are at just past the half way mark of 2012 and it’s time to ask yourself “do I have any skin in the cloud game?”

Through 2011 and 2012 the relative strength of the Australian dollar has meant the cost of entry to the major cloud platforms has dropped significantly for Australian businesses.  KPMG is estimating that if 75% of Australian businesses moved to cloud services that this would have a positive influence on Australia’s GDP to the tune of $3.32 billion annually!

I will admit to having been a cloud sceptic in past – certainly of the Azure platform, but with the recent set of changes introduced by Microsoft I’d say that Azure is now mature enough that it can be considered a competitive option against Amazon Web Services for complex application build and host (certainly in the .Net space).

What’s Your Cloud Tier?

I will add some more criteria to my original question – are you engaged on a “Tier 1″ public cloud platform?  I’d classify Tier 1 as any of the mature global players – Amazon Web Services, Microsoft Azure, Google App Engine are probably top three here (there are others but I’m not going to try to rate / rank the multitude available…)

Beyond this I’d classify a range of local “Tier 2″ providers that don’t compete on the same global scale as Tier 1 but offer similar sorts of options.  For the most part these providers tend to be not much more than highly virtualised traditional hosting businesses where you aren’t actually that far away from the bare metal.

Private Clouds

If you’re doing something like a Virtual Private Cloud (VPC) on a Tier 1 cloud platform then OK, you’re in the game.  If you built your own “Private Cloud” (whatever that is) or you’re running virtual machines in a Tier 2 provider then I’m afraid you can go and take a shower and head home.

My point here is that if you’re needing to care about anything vaguely hardware related or you don’t have global reach as an option on your platform then you’re not truly in the cloud.

Architectural Change

Once upon a time we had to care about resource utilisation when we ran our applications in the shared context of a mainframe that had limited (and expensive) resources.  The commoditisation of compute resources over the last 30 years means we stopped caring about the cost of CPU, RAM, Disk and network resources (for the most part).  Virtualisation only added to this.

Also, we controlled entire platforms top-to-bottom so we could tune or tweak aspects of the platform to suit our demands.  Virtualisation took away some of this flexibility but if you’ve ever spent any time managing VMWare or other systems you’ll know that there’s an large number of tuning possibilities that make the virtualisation layer almost entirely transparent.

We have become lazy in our architectural practices.  We stopped needing to solve some challenges because we could assume them away based on tuning our resources. Guess what?  We don’t get that any more with the cloud.  We still get commoditised resources but we also share it with other tenants.  This means we do need to start solving these challenges again through better architectural design.

Examples of common cloud scenarios that we haven’t had to solve recently our own platforms include:

  • Bandwidth constrained shared LAN segments.
  • I/O constrained disk access.
  • Transient component failure.
  • Dynamic scale up / scale down.
  • Pay-per-use for LAN traffic, disk access and other components.

If you’re not changing your architectural practices to take the above into your designs then you can also pack the bags and head home.

Additionally, if you’re a vendor and you aren’t making your licensing work for dynamic scale up / scale down you also lose the right to a spot on the team (and I have spoken to some vendors who aren’t supporting the cloud because it will gut their licensing model – not that they said it in so many words!)

In Summary

Ultimately the point I am trying to make is that if you haven’t been actively engaged in looking how you can move to the cloud then you are already too late to gain any form of competitive advantage in moving to it.  You should immediately start looking at ways to utilise the cloud even if it is only via small-scale deployments that are not necessarily related to key parts of your business.

I know I haven’t touched here on the data privacy / jurisdiction issues that are obviously a big issue for most Australian businesses, but there are ways to work around those challenges in the way you design and build your solutions.   Also, it’s highly likely we will also see at least one Tier 1 cloud provider here in Australia  with a full offering prior to the end of 2013. You should be getting ready now.

Finally, despite my obvious advocacy for the cloud you should always be aware of shamen.
Love consultants!