Reading and writing binary files with Python with Azure Functions input and output bindings

Published on
Reading time
Authors

Ah, Streams... how could I ever forget about manipulating files in code and needing to use Streams to read and write them!

I've written a lot of .NET in my life and am used to the way you read and write files with it, but I've been increasingly doing a lot of Python (BTW: .NET could learn a lot from Python's Virtual Environments construct!) and have been having to learn a whole new set of constructs to work with elements such as files.

For an upcoming demo I developed a small Python Azure Function whose job it is to create a thumbnail image of a JPEG file. You can find the source for this Azure Function on GitHub.

Azure Functions Triggers and Bindings

If you aren't familiar with Azure Functions let me briefly cover a couple of key concepts - Triggers and Bindings.

Azure Functions provide developers with a rapid development environment for event-driven solutions by having pre-built logic that removes a lot of plumbing code from your codebase.

Triggers are how an Azure Function is invoked and can be from many different sources - Timers triggers (like cron on Linux), File (Blob) Created Trigger, et al. You define the Trigger type when creating the Function and the Azure Functions tooling automatically wires up the Trigger, gives you configuration options and adds some parameters to the entry point of your Function that allow you to access context from the Trigger source.

In the sample Function below we have a Blob Trigger that includes a filter on the file type (JPG) that will cause the Trigger to fire. This Trigger also provides some additional parameters we can access - a Stream that contains the binary data of the file along with the "name" of the file which in this case just maps to the filename without the extension.

JpegUploadRouter.cs
using System;
using System.IO;
using Microsoft.Azure.WebJobs;
using Microsoft.Azure.WebJobs.Host;
using Microsoft.Extensions.Logging;
using Microsoft.WindowsAzure.Storage.Blob;

namespace Siliconvalve.Demo
{
    public static class JpegUploadRouter
    {
        [FunctionName("JpegUploadRouter")]
        [return: Queue("images", Connection = "customserverless01_QUEUE")]
        public static string Run([BlobTrigger("sampleuploads/{name}.jpg", Connection = "customserverless01_STORAGE")]Stream blobContent, string name, ILogger log)
        {
            log.LogInformation($"Routing image file: {name}.jpg");
            // just return the filename.
            return $"{name}.jpg";
        }
    }
}

This sample Function also uses an Output Binding. You can see the Output Binding definition in the above sample on line 13 where the [return] attribute defines the target for the Output (in this case an Azure Storage Queue).

C# Class Library Functions behave a little differently to other languages because we can be define a Binding entirely in code and not have to configure it via a function.json file.

My Scenario

I wanted my Python Azure Function to receive a message from an Azure Storage Queue, where the message contains the name of a file (blob) that has been uploaded previously to an Azure Blob Storage Container. The file would be downloaded to the Function host, processed and then written back to Azure Blob Storage at a different location.

The queue message, which is part of the Trigger for the Function, is bound via configuration to the Input and Output bindings in the function.json as follows. The special formatting of queueTrigger means at runtime the Functions handler will pass the message content to the Input and Output Binding automatically which means as a developer I don't need to write anything up in my code to achieve this.

function.json
{
  "scriptFile": "__init__.py",
  "bindings": [
    {
      "name": "msg",
      "type": "queueTrigger",
      "direction": "in",
      "queueName": "images",
      "connection": "customserverless01_STORAGE"
    },
    {
      "name": "inputblob",
      "type": "blob",
      "dataType": "binary",
      "path": "sampleuploads/{queueTrigger}",
      "connection": "customserverless01_STORAGE",
      "direction": "in"
    },
    {
      "name": "outputblob",
      "type": "blob",
      "path": "thumbnails/{queueTrigger}",
      "connection": "customserverless01_STORAGE",
      "direction": "out"
    }
  ]
}

This configuration is actually based on the Python Azure Functions documentation which is great for understanding the general format to use to create the bindings, but the sample Function they have is very basic and doesn't explore how you might manipulate the incoming file and write a file to the output binding.

The Solution

As it turns out the solution isn't that difficult, and I suspect some of my challenge was me trying to understand how to work with file streams in Python.

The resulting Function code (see the full solution on GitHub) is shown below. You can match the 'inputblob' and 'outputblob' parameters to the function.json configuration shown above. This really demonstrates how clean your code can be when working with Bindings.

from io import BytesIO
from logging import FileHandler
import logging
import azure.functions as func
from PIL import Image

def main(msg: func.QueueMessage, inputblob: func.InputStream,
         outputblob: func.Out[func.InputStream]) -> None:

    blob_source_raw_name = msg.get_body().decode('utf-8')
    logging.info('Python queue trigger function processed a queue item: %s', blob_source_raw_name)

    # thumbnail filename
    local_file_name_thumb = blob_source_raw_name[:-4] + "_thumb.jpg"

    #####
    # Download file from Azure Blob Storage
    #####
    with open(blob_source_raw_name,"w+b") as local_blob:
        local_blob.write(inputblob.read())

    #####
    # Use PIL to create a thumbnail
    #####
    new_size = 200,200
    im = Image.open(local_blob.name)
    im.thumbnail(new_size)
    im.save(local_file_name_thumb, quality=95)

    # write the stream to the output file in blob storage
    new_thumbfile = open(local_file_name_thumb,"rb")
    outputblob.set(new_thumbfile.read())

Lines 20 and 32 are the two key ones. On line 20 we receive the bytes from the input binding and invoke the read() Python method which will read in all bytes to the end of the stream. This is done inside of the write() method for our new local file object. As it turns out this is super clean!

The output binding (line 32) is similarly clean - we call outputblob.set() to write the file data to that stream and use the local file's read() method to read the bytes from the new local file.

Note: there is one minor detail to be aware of. Right now there is a limitation with the Output Binding for Azure Blob Storage - all files are written with a Content Type of 'application/octet-stream' which means you can't easily embed them in web pages (you will be prompted to download the file). There is an open GitHub Issue for this limitation, so hopefully it will be something we can control in future. If you're not intending to serve the files via the web directly then this is a non-issue.

So there we are - how you can quickly and cleanly use Azure Functions written in Python to process files. Hopefully this post will save you some time if you're looking to do something similar in your code!

Happy Days 😎