Not Your Father's Cloud: Microsoft Azure HDInsights Explained

Published on
Reading time
Authors

If you need to understand the format of this post, take a look at my introduction to the series.

Like A Boss

HDInsights is designed to help tame the Big Data beast by providing an on-demand Apache Hadoop solution hosted on Azure.  You can create a Cluster of between 1 and 32 Nodes and use standard Hadoop tools like Pig and Hive as well extensions in your favourite tool Excel.  Note, however, that HDInsights isn't a service you can just rock up and use - you'll either need to be planning to (or already) be doing some serious number crunching and have people who are familiar working with Hadoop to gain any value out of it.

Goes Well With

  • Transformation and Analysis of large unstructured datasets which can be transported to or accessed by Azure.
  • SQL Server BI tooling - connect your SQL Server BI, Analytics and Reporting to Hive.

Open Other End

  • Not a replacement for High Performance Computing (HPC) solutions.
  • Structured data analysis may be better suited to SQL Server depending on dataset size.

Contents May Be Hot

  • Azure Blob Storage works out cheaper in the longer term than using HDFS for data storage.
  • Not all the Hadoop tooling will work exactly as you expect.

Don't Take My Word For It