Easy Filtering of IoT Data Streams with Azure Stream Analytics and JSON reference data

I am currently working on an next-gen widget dispenser solution that is gradually being rolled out to trial sites across Australia. The dispenser hardware is a modern platform that provides telemetry data that can be used for various purposes by the locations at which the dispenser is deployed and potentially by other third parties.

In addition to these next-gen dispensers we already have existing dispenser hardware at the locations that emits telemetry that we already use for other purposes in our solution. To our benefit both the new and existing hardware emits the same format telemetry data 🙂

A sample telemetry entry is shown below.

sample-data.json

{
  "LocationId": "NSW2143",
  "DispenserNumber": 1,
  "WidgetTypeNumber": 24,
  "WidgetTypeName": "Demo Widget",
  "DispenserEventId": "3960157e-1dc5-482a-afa6-6a8e80bfa167",
  "ControlMode": 1,
  "EventType": 3,
  "EventTimeUTC": "2018-01-11T07:18:01+00:00"
}

We take all of the telemetry data from new and old hardware at all our sites and feed it into an Azure Event Hubs which allows us to perform multiple actions, such as archival of the data to Blob Storage using Azure Event Hub Capture and processing the streaming data using Azure Stream Analytics.

We decided we wanted to do some additional early stage testing with some of the next-gen hardware at a few sites. As part of this testing we also wanted to push the data for just specific hardware to a partner organisation we are working with. So how did we achieve this?

The first step was to setup another Event Hub. We knew this partner would not have any issues consuming event data from a Hub and it made the use of Stream Analytics an obvious way to process the incoming complete stream and ensure only the data for dispensers and sites we specify is sent to the partner.

Stream Analytics has the concept of Reference Data which takes the form of slow-moving (or static) data that can be read from a blob storage account in Azure.

We identified our site and dispensers and created our simple Reference Data JSON file - sample below.

reference-data.json

{
    "LocationId" : "NSW2143",
    "PartnerDispensers" : [ 1, 2, 3 ]
}

The benefit of this format is that we can manage additional sites and dispenser by simply editing this file and uploading to blob storage! Stream Analytics even helps us by providing a useful naming scheme for files so you don't even need to stop your Stream Analytics Job to update it! We uploaded our first file to a location that had the path of

/siterefdata/2018-01-09/11-40/sitedispensers.json

In future when we want to update the file, we edit it and then upload to blob storage at, say

/siterefdata/2018-02-01/00-00/sitedispensers.json

When the Job hits this date / time (UTC) it will simply pick up the new reference data - how cool is that?!

In order to use the Reference Data auto-update capability you need to set up the path naming scheme when you define the reference data as an input into the Stream Analytics Job. If you don't need the above capability your can simply hard code the path to, say, a single file.

The final piece of the puzzle was to write a Stream Analytics Job that used the Reference Data JSON as one input and read the site identifier and dispensers from the included integer array. Thankfully, the in-built GetArrayElements Function came in handy, along with CROSS APPLY which gives us the ability to iterate over the elements and use them handily in the WHERE clause of the query!

sample-query.asaql

SELECT
    RD.DispenserEventId AS EventId,
    RD.DispenserNumber,
    RD.LocationId,
    RD.EventTimeUTC AS Timestamp
INTO
    ehpartnerdata
FROM
    ehrawdispenserfeed RD
JOIN siteandispenserlist SDL ON RD.LocationId = SDL.LocationId
CROSS APPLY GetArrayElements(SDL.PartnerDispensers) As SiteDispenser
WHERE
    RD.EventType = 2
    AND
    SiteDispenser.ArrayValue = RD.DispenserNumber

The resulting solution now handily carves off the telemetry data for just the dispensers we want at the sites we list and writes them to an Event Hub the partner organisation can use.

I commented online that this sort of solution, and certainly one that scales as easily as this will, would have been something unachievable for most organisations even just a few years ago.

The cloud, and specifically Azure, has changed all of that!

Happy Days 😎