Category Archives: Big Data

Easy Filtering of IoT Data Streams with Azure Stream Analytics and JSON reference data

I am currently working on an next-gen widget dispenser solution that is gradually being rolled out to trial sites across Australia. The dispenser hardware is a modern platform that provides telemetry data that can be used for various purposes by the locations at which the dispenser is deployed and potentially by other third parties.

In addition to these next-gen dispensers we already have existing dispenser hardware at the locations that emits telemetry that we already use for other purposes in our solution. To our benefit both the new and existing hardware emits the same format telemetry data 🙂

A sample telemetry entry is shown below.

We take all of the telemetry data from new and old hardware at all our sites and feed it into an Azure Event Hub which allows us to perform multiple actions, such as archival of the data to Blob Storage using Azure Event Hub Capture and processing the streaming data using Azure Stream Analytics.

We decided we wanted to do some additional early stage testing with some of the next-gen hardware at a few sites. As part of this testing we also wanted to push the data for just specific hardware to a partner organisation we are working with. So how did we achieve this?

The first step was to setup another Event Hub. We knew this partner would not have any issues consuming event data from a Hub and it made the use of Stream Analytics an obvious way to process the incoming complete stream and ensure only the data for dispensers and sites we specify is sent to the partner.

Stream Analytics has the concept of Reference Data which takes the form of slow-moving (or static) data that can be read from a blob storage account in Azure.

We identified our site and dispensers and created our simple Reference Data JSON file – sample below.

The benefit of this format is that we can manage additional sites and dispenser by simply editing this file and uploading to blob storage! Stream Analytics even helps us by providing a useful naming scheme for files so you don’t even need to stop your Stream Analytics Job to update it! We uploaded our first file to a location that had the path of


In future when we want to update the file, we edit it and then upload to blob storage at, say


When the Job hits this date / time (UTC) it will simply pick up the new reference data – how cool is that?!

In order to use the Reference Data auto-update capability you need to set up the path naming scheme when you define the reference data as an input into the Stream Analytics Job. If you don’t need the above capability your can simply hard code the path to, say, a single file.

The final piece of the puzzle was to write a Stream Analytics Job that used the Reference Data JSON as one input and read the site identifier and dispensers from the included integer array. Thankfully, the in-built GetArrayElements Function came in handy, along with CROSS APPLY which gives us the ability to iterate over the elements and use them handily in the WHERE clause of the query!

The resulting solution now handily carves off the telemetry data for just the dispensers we want at the sites we list and writes them to an Event Hub the partner organisation can use.

I commented online that this sort of solution, and certainly one that scales as easily as this will, would have been something unachievable for most organisations even just a few years ago.

The cloud, and specifically Azure, has changed all of that!

Happy Days 😎

Tagged , , , ,

June 24: I’m talking about Azure HDInsight at Sydney ALT.NET

Myself and my colleague Jibin Johnson will be talking about Microsoft’s cloud Big Data story based on Azure HDInsight and Power BI.

Come along and see how Microsoft is making use of the Power of the Elephant in the Cloud!

Tagged , ,

Not Your Father’s Cloud: Microsoft Azure HDInsights Explained

If you need to understand the format of this post, take a look at my introduction to the series.

Like A Boss

HDInsights is designed to help tame the Big Data beast by providing an on-demand Apache Hadoop solution hosted on Azure.  You can create a Cluster of between 1 and 32 Nodes and use standard Hadoop tools like Pig and Hive as well extensions in your favourite tool Excel.  Note, however, that HDInsights isn’t a service you can just rock up and use – you’ll either need to be planning to (or already) be doing some serious number crunching and have people who are familiar working with Hadoop to gain any value out of it.

Goes Well With

  • Transformation and Analysis of large unstructured datasets which can be transported to or accessed by Azure.
  • SQL Server BI tooling – connect your SQL Server BI, Analytics and Reporting to Hive.

Open Other End

  • Not a replacement for High Performance Computing (HPC) solutions.
  • Structured data analysis may be better suited to SQL Server depending on dataset size.

Contents May Be Hot

  • Azure Blob Storage works out cheaper in the longer term than using HDFS for data storage.
  • Not all the Hadoop tooling will work exactly as you expect.

Don’t Take My Word For It

Tagged ,