Customised Serverless using Azure Functions, Docker and Kubernetes Event-Driven Autoscaler (KEDA)

Published on
Reading time
Authors

As Azure Functions have matured, the number of ways you can build and run them has increased and the need to execute long-running, compute intensive or complex jobs is becoming more common. Some examples of these emerging job types include the running of Machine Learning models or the batch processing of videos to transcode them.

Additionally, as serverless technology permeates the developer fabric of businesses, I am seeing increasing demand for better control over how and what is deployed and operated as part of serverless solutions.

In this post I am going to demonstrate how you can:

  • Use Azure Functions as your event-driven host regardless of environment
  • Customise the Functions execution sandbox in a secure and industry-standard way
  • Deploy Azure Functions in Containers on Azure and on Kubernetes using KEDA.

The Scenario

For this blog I'm going show you how you can use FFMPEG to create a thumbnail image from a video file. This will be something you've experienced as a user if you've ever uploaded a video to YouTube.

FFMPEG Process with Functions

The biggest challenge in this scenario is the size (and running time) of some of the video files will be substantial, which means each Functions worker will be busy for quite some time creating thumbnails.

This long processing time will be due to downloading the file, processing it using FFMPEG, then uploading the resulting thumnbnail back to Blob Storage. We have to download the video file locally because the FFMPEG process we use to create the thumnbnail can't work on streams - it needs local file access.

Our sample project

I've written a simple Node-based Azure Function for this post which you can find on GitHub. I'm using a great Node library to wrap FFMPEG and provide with a simple way to generate a strip of thumbnails (see sample below).

This Function uses an Azure Storage Queue trigger and assumes that the queue message contains the name of the video file to have thumbnails created.

If you want to try this Function out for yourself you'll need Visual Studio Code and the V2 release of the Azure Functions Core Tools and you can debug locally. I developed using the VS Code Remote Development tools which allowed me to run the Function host on Windows Subsystem for Linux (WSL). Full tool specs are listed on GitHub.

I used a open license video from Pixabay for the demo - the video below can be downloaded from them.

When you execute the Function and supply the above video the resulting thumbnail image below is produced.

Video Thumbnail

Let the fun begin!

If you run the above Function sample on a newly built machine with the default OS installation you probably received an error similar to 'ffmpeg: command not found'.

That's because FFMPEG isn't a standard part of the OS, but you can install it. On my WSL Ubuntu host I used the following command to install it.

sudo apt-get install ffmpeg

You can probably guess, based on what I've written above, that FFMPEG isn't on the Linux hosts used by Azure Functions in Azure App Service 🙂.

We can confirm this by uploading our Function (and its configuration) to a Linux Consumption Plan (see how to create one in the docs) and try it out.

The result is...

2019-06-21T07:52:29.312 [Information] { Error: spawn ffprobe ENOENT
    at Process.ChildProcess._handle.onexit (internal/child_process.js:190:19)
    at onErrorNT (internal/child\_process.js:362:16)
    at _combinedTickCallback (internal/process/next_tick.js:139:11)
    at process._tickCallback (internal/process/next_tick.js:181:9)
  errno: 'ENOENT',
  code: 'ENOENT',
  syscall: 'spawn ffprobe',
  path: 'ffprobe',
  spawnargs:
   [ '-print_format',
     'json',
     '-show_error',
     '-show_format',
     '-show_streams',
     '/tmp/838fcd10-93f9-11e9-b2ab-59607ad195b6-baking.mp4' ],
  stdout: '',
  stderr: '',
  failed: true,
  signal: null,
  cmd: 'ffprobe -print_format json -show_error -show_format -show_streams /tmp/838fcd10-93f9-11e9-b2ab-59607ad195b6-baking.mp4',
  timedOut: false,
  killed: false }

Ummm.. computer says no! 😐

Now at this stage you're probably thinking you can include the necessary FFMPEG binaries with your Function App and deploy them that way... unfortunately not!

The first time I ran into this it made me annoyed, but I've come to realise that it's actually a good thing you can't just upload arbitrary binaries or system libraries as part of your Functions code. Four reasons for this:

  1. You should only be publishing your source code because it's what you've tested and validated
  2. If you upload binaries or libraries they might not be compatible with the host OS leading to unexpected behaviours or crashes.
  3. Most executables rely on other libraries or binaries (did you watch what was installed when you installed FFMPEG?). That means you'd need to install all of those too!
  4. Did you security scan the binaries you bundled with your solution?

So how do we solve this challenge with Functions?

Say hello to my little friend

The Functions team has a solid way to support demands such as running external binaries - you can build your own Docker image and use that in Azure.

If you go and look at the sample project on GitHub you will notice that there is already a Dockerfile that was added to the existing project using this command in the root folder:

func init --docker-only

This command will ignore any existing configuration files and simply add a standard Dockerfile that will bundle your code and extensions into a Container based on the Functions runtime host Docker image (see the Dockerfile for Node on GitHub).

Now for the best bit - the Dockerfile is editable :) and if you look at the Dockerfile on GitHub you will see I have customised it:

# Install FFMPEG RUN
apt-get update && \
apt-get install -y ffmpeg

If we build the Container Image we'll end up with FFMPEG (and its dependencies) installed with the right execution settings.

Deploying to App Service

When you first create a Linux Azure Function host in Azure you have to specify if it will be a 'code' deployment (just push your own code onto the existing runtime) or a 'container' deployment (push a Docker image that has your code in it). This means we can't use the same Function plan we used above when trying out this code which failed due to the missing FFMPEG binaries.

The Functions team has documented how you deploy an image which is on learn.microsoft.com.

I used Azure Container Registry (ACR) to hold my images which I built and published locally on WSL using the following commands (these assume you've already logged into Azure using 'az login' and ACR using 'docker login'):

docker build -t my_acr_instance.azurecr.io/custom-func-node:1.0 . docker push my_acr_instance.azurecr.io/custom-func-node:1.0

Now that we've pushed our Function code to ACR we can benefit from Content Trust and security scanning from Aqua or Twistlock which will keep the security folks happy!

You can create your new Linux Azure Function and pass the ACR reference as the source image:

az functionapp create --name <app_name> \
   --storage-account <storage_name> \
   --resource-group myResourceGroup \
   --plan myAppServicePlan \
   --deployment-container-image-name my_acr_instance.azurecr.io/custom-func-node:1.0

When you deploy your image to an App Service Plan you will also need to deploy your Function's settings using the Azure CLI (see the documentation) or via the Azure Portal.

If you run into issues after deployment the debugging experience is little unpolished at the time of writing - you need to enable diagostic logging (find it under "Platform features") and then download the logs in a zip file format using Advanced tools (Kudu).

Diagnostics Logs

Now that we've conquered running customised Functions environments on Azure App Service Plans let's move on to the last target environment.

Azure Functions on Kubernetes

Why might you look to Kubernetes (and KEDA) over an Azure App Service Plan with a Docker Image? Here are some reasons:

  1. You're already using Kubernetes for other workloads, so you want to have a consistent management experience.
  2. You'd like to use Azure Functions but want them only attached a private network and don't want to run an App Service Environment (ASE) to achieve that.
  3. You want to benefit from rapid scale and per-second billing by using Virtual Nodes on Azure Kubernetes Serivce (AKS).

As we already have our Azure Function containerised we are most of the way to having things up and running using KEDA!

For this blog I spun up an Azure Kubernetes Service (AKS) cluster with a single node and ensured that I enabled the Virtual Node capability at creation time. As AKS and ACR live the same Azure subscription they already share a common identity connection so we can easily push image updates from ACR into AKS.

Before we deploy our Containerised Function we need to deploy KEDA to AKS by running the following command:

func kubernetes install --namespace keda

This command assumes your local Kubernetes context has the right cluster selected.

Now we can create our Kubernetes deployment quite easily based on the existing Container Image and the local Functions tooling

func kubernetes deploy --name video-thumbnailer \
     --registry my_acr_instance.azurecr.io \
     --javascript --dry-run > deploy.yaml

This will produce a deployment YAML file we can re-use. I do want to make a few edits which are in the below Gist and explained next.

  • Update the container image (line 34) to: my_acr_instance.azurecr.io/custom-func-node:1.0
  • Allow the deployment to be scheduled onto Virtual Nodes (lines 41 and 42)
  • For this demo add queueLength: '2' (line 61) to the ScaledObject definition which will force scale-out behaviour quickly.

https://gist.github.com/sjwaight/83aa95e68fde9c35400018d6c2d7a268

We can deploy this to our Kubernetes cluster simply by running this command:

kubectl apply -f deploy.yaml

If you aren't running any other workloads on your cluster you can see what happens when KEDA triggers a scale event by running the following command:

kubectl get pods -w

Which should result in no response immediately. Now head over to Storage Explorer and start pumping in video references and watch what happens to your pods list!

:cool:

The scale-in behaviour is driven by the default configuration of ScaledObjects which is that after 300 seconds of no new traffic the pods will be torn down, reducing to our minimum count of zero :). You can control this cool down period by editing the ScaledObject definition before deploying.

What's next?

The Azure Functions host is cross-platform and open source, which means it offers a compelling baseline for platform-agnostic event-driven programming.

Coupled with the extensible, open source extensions model for bindings and triggers, the only limit to it solving your problem is the time to build the extensions you need.

Once we see maturity in the CNCF's Cloud Events space, we have an event orchestrator that can be run anywhere and be triggered by and integrated with anything.

Using Azure Functions with Kubernetes, KEDA and Osiris brings us to the point where any developer can truly use "serverless" to solve their problems.

Happy Days 😎

Comments or questions? Leave a comment below!

If you've read this far here's a bonus for you - slides from a related talk are available on Speaker Deck: https://speakerdeck.com/sjwaight/azure-functions-custom-containers-and-keda