Save Bytes, Your Sanity and Money

In this day of elastic on-demand compute resource it can be easy to lose focus on how best to leverage a smaller footprint when it's so easy to add capacity. Having spent many a year working on the web it's interesting to see how development frameworks and web infrastructure has matured to better support developers in delivering scalable solutions for not much effort. Still, it goes without saying that older applications don't easily benefit from more modern tooling and even newer solutions sometimes fail to leverage tools because the solution architects and developers just don't know about them. In this blog post I'll try to cover off some and provide background as to why it's important.

Peak hour traffic

We've all driven on roads during peak hour – what a nightmare! A short trip can take substantially longer when the traffic is heavy. Processes like paying for tolls or going through traffic lights suddenly start to take exponentially longer which has a knock-on effect to each individual joining the road (and so on). I'm pretty sure you can see the analogy here with the peaks in demand that websites often have, but, unlike on the road the web has this problem two-fold because your request generates a response that has to return to your client (and suffer a similar fate).

At a very high level the keys to better performance on the web are:

ensure your web infrastructure takes the least amount of time to handle a request
make sure your responses are streamlined to be as small as possible
avoid forcing clients to make multiple round-trips to your infrastructure.

All requests (and responses) are not equal

This is subtle and not immediately obvious if you haven't seen how hosts with different latencies can affect your website. You may have built a very capable platform to service a high volume of requests but you may not have considered the time it takes for those requests to be serviced.

What do I mean?

A practical example is probably best and is something you can visualise yourself using your favourite web browser. In Internet Explorer or Chrome open the developer tools by hitting F12 on your keyboard (in IE make sure to hit "Start Capturing" too) – if you're using Firefox, Safari, et al… I'm sure you can figure it out ;-). Once open visit a website you know well and watch the list of resources that are loaded. Here I'm hitting Google's Australia homepage.

I'm on a very low latency cable connection so I have a response in the milliseconds.

This means that despite the Google homepage sending me almost 100 KB of data it serviced my entire request in under half a second (I also got some pre-cached goodness thrown in which also makes the response quicker). The real interest beyond this is what is that time actually made up of? Let Chrome explain:

My client (browser) spent 5ms setting up the connection, 1ms sending my request (GET http://www.google.com.au/), 197ms waiting for Google to respond at all, and then 40ms receiving the response. If this was a secure connection there would be more setup as my client and the server do all the necessary encryption / decryption to secure the message transport.

As you can imagine, if I was on a high latency connection each one of these values could be substantially higher. The net result on Google's infrastructure would be:

It takes longer to receive the full request from my client after connection initialisation
It takes longer to stream the full response from their infrastructure to my client.

Both of which means my slower connection would use Google's server resources for longer thus stopping those resources servicing another request.

As you can see this effectively limits the infrastructure to run at lower capacity than it really could and also demonstrates why performing load testing requires that you run test agents that utilise different latencies so you can gauge realistically what your capacity is.

Some things you can do

Given you have no control over how or where the requests will come from there are a few things you can do to help reduce the effect of low latency clients will impact your site.

Reduce the number of requests or round trips: often overlooked but is increasingly becoming easier to achieve. The ways you can achieve a reduction in requests include:
1. Use a CDN for resources: Microsoft and Google both host jQuery (and various jQuery plugins) on their CDNs. You can leverage these today with minimal effort. Avoid issues with SSL requests by mapping the CDN using a src attribute similar to "//ajax.googleapis.com/ajax/libs/jquery/1.6.1/jquery.min.js" (without the http: prefix). Beyond jQuery push static images, CSS and other assets to utilise a CDN (regardless of provider) – cost should be no big issue for most scenarios.
2. Bundle scripts: most modern sites make heavy use of JavaScript and depending on how you build your site you may have many separate source files. True, they may only be a few KB but each request a client makes will need to go through a process similar to the above. Bundling refers to the combining of multiple JavaScript files into a single download. Bundling is now natively supported in ASP.Net 4.5 and is available in earlier versions through third-party tooling for either runtime or at-build bundling. Other platforms and technologies offer similar features.
3. Use CSS Sprites: many moons ago each individual image reference in CSS would be loaded as an individual asset onto your server. While you can still do this the obvious net effect is the need to request multiple assets from the server. CSS sprites combine multiple images into one image and then utilise offsets in CSS to show the right section of the sprite. The upside is also client-side caching means any image reference in that sprite will be serviced very quickly.
4. Consider inline content: there I said it. Maybe include small snippets of CSS or JavaScript in the page itself. If it's the only place it's used why push it to another file and generate a second request for this page? Feeling brave? You could leverage the Data URI scheme for image or other binary data and have that inline too.
Reduce the size of the resources you are serving using these approaches:
1. Minification: make sure you minify your CSS and JavaScript. Most modern web frameworks will support this natively or via third-party tooling. It's surprising how many people overlook this step and on top of that also don't utilise the minified version of jQuery!
2. Compress imagery: yes, yes, sounds like the pre-2000 web. Know what? It hasn't changed. This does become increasingly difficult when you have user generated content (UGC) but even there you can provide server-side compression and resizing to avoid serving multi-MB pages!
3. Use GZIP compression: there is a trade-off here – under load can your server cope with the compression demands? Does the web server you're using support GZIP of dynamic content? This change, while typically an easy one (it's on or off on the server) requires testing to ensure other parts of your infrastructure will support it properly.
Ensure you service requests as quickly as possible – this is typically where most web developers have experience and where a lot of time is spent tuning resources such as databases and SANs to ensure that calls are as responsive as possible. This is a big topic all on its own so I'm not going to dive into it here!

If you're a bit lost were to start it can pay to use tools like YSlow from Yahoo! Or PageSpeed from Google – these will give you clear guidance on areas to start working on. From there it's a matter of determining if you need to make code or infrastructure changes (or both) to create a site that can scale to more traffic without needing to necessarily obtain more compute power.

Hope you've found this useful – if you have any tips, suggestions or corrections feel free to leave them in the comments below.