Azure SQL Database - Dynamic Data Masking Walkthrough

Published on
Reading time
Authors

Microsoft recently announced the public preview of the Dynamic Data Masking (DDM) feature for Azure SQL Database that holds a lot of potential for on-the-fly data obfuscation that traditionally would have required either custom business logic or third party systems.

In this post I am going to take the opportunity to walk through how we can set this up for an existing database. For the purpose of this post I am going to utilise the AdventureWorks sample database for Azure SQL Database which you can download from Codeplex.

Our Azure SQL Database Server

Firstly we need to understand our Azure SQL Database Server instance configuration. At time of writing any existing Azure SQL Database Server or one created without specifying a Version flag will use the "2.0" Azure SQL Database engine which does not support DDM.

If this is your scenario the you will need to upgrade your server to v12 using the Azure Management Portal - make sure you follow the planning guidance provided by Microsoft. Note that an upgrade has minimal impact on running databases - there will be brief interruption to service at the very end of the upgrade, but for the rest of the process your database will remain online.

I'm going to setup my server using the following PowerShell, but if you want to specify a server name them you will need to use the preview portal (https://portal.azure.com/). Note the Version parameter I'm providing to specify the new engine release.

# Should return 0.8.14
(Get-Module Azure).Version

New-AzureSqlDatabaseServer -Location "West US" \
                           -AdministratorLogin bobsyouradmin \
                           -AdministratorLoginPassword N0tRe4l? \
                           -Version 12.0

Notes:

  • At the time of writing there is a known issue with this Cmdlet due to the time it takes to complete exceeding the maximum PowerShell timeout. The server will still be created but you will receive an error.
  • Make sure you open up the firewall on the server to allow your client machine to connect. You can do this via the Azure portal or PowerShell.

After this has run I'll have an Azure SQL Database instance with the v12 engine ready for use. My next step is to deploy the Adventureworks database schema and data. We can do this leveraging the downloaded package (from Codeplex) and then running the following command at an Admin-elevated command prompt.

CreateAdventureWorksForSQLAzure.cmd {servername}.database.windows.net {user}@{servername} {password}

Once this command completes we will have a database setup that we can now query. I'm going to simply use SQL Server Management Studio (SSMS) for this. You will need a version recent enough to support Azure SQL Database (SSMS for SQL Server 2012 and 2014 will do).

This is the test query we'll use to show how data masking works. I'm going to run the initial query as our admin user ('bobsyouradmin').

SELECT TOP 10 BusinessEntityID, FirstName, MiddleName, LastName
FROM Person.Person

Which results in the following.

BusinessEntityIDFirstNameMiddleNameLastName
285SyedEAbbas
293CatherineR.Abel
295KimNULLAbercrombie
2170KimNULLAbercrombie
38KimBAbercrombie
211HazemEAbolrous
2357SamNULLAbolrous
297HumbertoNULLAcevedo
291GustavoNULLAchong
299PilarNULLAckerman

Getting the Mask On

Now that we've gotten everything setup, let's go and mask the LastName field. Firstly we need to switch on Dynamic Data Masking for our Azure SQL Database instance via the Azure Portal.

Navigate to the Azure SQL Database instance and then select the Auditing & Security tab on the very right.

This is the Audit and Security Option

Then scroll down to the bottom and switch on Dynamic Data Masking as shown below. I'm going to add our current login as a privileged login too (this means it can query the data without a mask).

Data Mask Setup

Now at this stage if I re-run the query above with the same credentials then I will get back exactly what I have above which is expected because I allowed my admin user to be excluded from the masking (and I'm not using a secured connection string).

Let's go ahead and create another login and grant it access to the database - we'll just make it a data reader so that it can run the SELECT query but not update anything.

Firstly create a login for the server

CREATE LOGIN untrusteddev WITH PASSWORD = 'N0tL0Lc4t5^'
GO

Then a user in the Master and AdventureWorks databases

CREATE USER untrusteddev FOR LOGIN untrusteddev WITH DEFAULT_SCHEMA = [dbo]
GO

-- Add user to the database data reader role (not needed in Master DB)
EXEC sp_addrolemember N'db_datareader', N'untrusteddev'
GO

Now let's close our connection to the Azure SQL Database and create a new one with our new login.

As we don't trust the developer we're going to give him a special connection setting to use. Instead of the standard

{servername}.database.windows.net

we will provide the developer with

{servername}.database.**secure**.windows.net.

We will turn off the ability for clients to connect via the old un-secured hostname as well.

Before we give our developer access to the database let's make a tweak by adding a mask to our Person table. On the Auditing & Security page for our database I'm going to click "Add Mask" in the bottom menu and then add the mask as shown below.

Mask Rule

Now that this is place we can provide our developer with the necessary connection details and they can connect via SSMS like so:

Secured Connection

When the developer logs in everything looks pretty much as it always does.

Let's pop open a new Query Window in our AdventureWorks database and re-run our SQL query on the Person table from above.

SELECT TOP 10 BusinessEntityID, FirstName, MiddleName, LastName
FROM Person.Person

So can we see the full Person details as before?

and our result set looks like:

BusinessEntityIDFirstNameMiddleNameLastName
285SyedEXXXXXXXX
293CatherineR.XXXXXXXX
295KimNULLXXXXXXXX
2170KimNULLXXXXXXXX
38KimBXXXXXXXX
211HazemEXXXXXXXX
2357SamNULLXXXXXXXX
297HumbertoNULLXXXXXXXX
291GustavoNULLXXXXXXXX
299PilarNULLXXXXXXXX

The key take away from this is that we haven't actually changed the underlying data - the database does still contain the LastName of the Person - it's just that the user context of the call is such that a configured mask is applied to the data as it is retrieved from the server.

We can use this approach for a lot of different data types and I expect eventually you'll have a lot more customisation you can apply here. This is very much a "v1" release and will no doubt be added to over time. This is a great feature that may eventually remove some need to perform mass data cleansing operations when populating dev/test data sources with "production" data.