Azure DevOps Masking a.k.a “point, no click”

“[My] kids haven’t responded to my GDPR requests so I don’t think I’m legally allowed to tell them when dinner’s on the table.”
@mrdaveturner

Ah masking. You would have thought I’d be sick of it by now, no? No, fortunately now, more so than ever, I find myself answering question after question and tackling use-case after use-case. So when I was asked this week:

“Chris, is there a way for us to call Data Masker for SQL Server directly from Azure DevOps?”

I thought to myself, well that sounds easy enough… and it was! I know what you’re thinking, c’mon Chris, surely there is more to it? But no, it’s actually pretty straight forward!

I pointed them at the PowerShell module and cmdlets for SQL Provision and the Azure DevOps plugin to automate all of their Provisioning and Masking process, thinking all the while “pffft, they could have made this harder!” and then…

“No sorry Chris, is there a way for us to call JUST Data Masker for SQL Server directly from Azure DevOps?”

Ah! Now that’s an interesting one!

#1 Figure out where you want Data Masking to run in your process

This empty Azure deployment stage looks good enough for now! If you wanted to chain other processes either side of it, that’s cool too! Maybe you have your own provisioning process in place and you want to point Data Masker at it to sanitize it? Makes sense to me! For now I’m going to stick with a single agent job for simplicity.

#2 Figure out what is actually going to run Data Masker

Data Masker is a client install and as such will need to be installed on a *gasp* actual machine!

No but seriously, any server you have lying around, physical or VM will do the trick as long as it meets these requirements. Now this Server/VM will need to have an Azure DevOps agent on it already, which of course is the ideal candidate for being the “thing” that calls Data Masker – this could be the Staging/Non-Functional/Pre-Prod environment also of course, so you could copy down PROD and then immediately invoke masking.

#3 Call the command line from Azure DevOps

In your pipeline steps you can specify the calling of an executable on the machine where the agent resides. Fortunately Data Masker has a wonderful command line available that you can call, you can read all about it here: https://documentation.red-gate.com/dms/data-masker-help/general-topics/about-command-line-automation

The PARFILE you could of course dynamically replace with variables so that it only calls the relevant parameter file for that particular database as well, a nice benefit!

My PARFILE just simply looked like this:

It was calling a local Data Masker set “AzureFun” – now the thing to bear in mind is that Data Masker will run with the Windows authentication credentials that are being run as by the Azure DevOps agent, unless you specify otherwise. In this case because the Azure DevOps agent has the correct permissions to update the databases on this instance anyway I’m fine to use Windows Authentication:

Conclusion

It’s very easy to simply call the command line of Data Masker for SQL Server directly from Azure DevOps, does this same approach work from other CI/CD tools? If they can call executables on the target server then absolutely! So it’s very easily included in the process – you just have to think about where Data Masker is installed and what credentials you’re using for it!

Bonus Point – what about if it’s all Azure SQL Database?

You had to do it didn’t you, you had to say it!

“But Chris, now we know we can call this all from Azure DevOps, what if we wanted to mask and copy Azure SQL Databases into Dev/Test etc.?”

Well actually the good thing is, it’s also pretty similar! When you’re connecting Data Masker to an Azure SQL DB you only need to specify this in the connections in the controller. Again, authentication will likely have to be SQL Auth at this point, and you need to be in Cloud mode, and I’d recommend setting the connection timeout to 10s rather than the standard 5s, but it can still be called as normal from the PARFILE:

So the Data Masker element is reasonably straight forward – that’s the good news. But the thing you REALLY need to stop and think about is:

Where are our Dev and Test copies going to BE?

Option #1: If they’re going to be on VMs or local dev and test servers / developer machines then you could follow a similar approach to one I laid out in this blog post for Redgate in which you create a BACPAC file and split it out on premise before importing it and then provisioning from there. And you could use this code in my Github to achieve something very similar. Caveat: I am no PowerShell guru, who do you think I am? Rob Sewell? Chrissy LeMaire? No. Sadly not. So you can build your own logic around my code though, have at it, I don’t mind! ^_^

Option #2: Keeping everything in Azure. You can copy databases around in Azure and it seems to work pretty well! So I wrote this PowerShell (also in my GitHub for y’all) to effectively copy a PROD DB into the same resource group, mask it and then copy it across to a Dev/Test resource group, dropping the temp copy so as not to incur lots of extra Azure costs (this is just one of the methods I’ve seen people use, again it’s up to you!) – again, see the caveat in option #1 above for my statement on PowerShell! The good thing is, you can use the ‘&’ simply from PowerShell to call Data Masker’s command line.

Either of these options can be run from Azure DevOps also as part of your provisioning or working processes, but instead of including a call to the command line, you can run a fun PowerShell script instead:

Second Conclusion *sigh*

There are lots of ways to get what you need into Dev and Test, but these copies should be masked if they contain personal, identifying information. There are some methods above but there are plenty of others out there on the internet and if you’re not sure about getting started with data masking; try my post here – happy masking!

Cloning From Home… a consideration.

“I know there’s a proverb which that says ‘To err is human,’ but a human error is nothing to what a computer can do if it tries.”
– Agatha Christie

Working from home is fine because we still have the right connections in place. Televisions to watch only the most stimulating and educational shows, telephones to get instantly in touch with those we hold most dear and crucially the internet to do everything we can to still be as (and sometimes more) productive at work as possible.

But it’s happening more and more now. We have to work from home and this is starting to turn up some problems.

And no. I don’t just mean that my wife is now acutely and accurately aware of how annoying I am.

Judge Judy Reaction GIF

I’m talking about clones. Specifically SQL Clone, and clones do not work great from home.

Now for anyone out there who has never heard of SQL Clone, where have you been?? It’s an incredibly intuitive, reliable tool for rapid provisioning of database copies (*cough* definitely not just lifted from the website) – in any case, it’s pretty darn cool.

One of the coolest things about this technology though is how it seamlessly plugs into the Microsoft Ecosystem by leveraging the VHD(X) technology available in x64 Windows. This means there’s no special file systems, no hard or software “appliances”, it’s very much plug and play (with a little tinkering around ports 14145 and 14146 of course!)

Naturally though it does come with it’s challenges, as does all technology, and it has been highlighted recently more so than ever.

The Problem

Whilst SQL Clone is a beautiful, elegant and easily automated answer to the database provisioning problem – there are 2 gotcha’s that you must be aware of when you start using the solution. SQL Clone relies on the relationship between the Image file (the centralized, virtualized parent, as it were that exists on a windows fileshare) and the Clones themselves (the 40mb diff disks on the hosts).

Now many people choose to put Clones onto their workstations, which for many of us is our laptops. We have SQL Server Developer installed and we pull down a few clones for different branches, for testing etc. etc. and all is well with the world.

When you’re in the office that is.

When a user queries or works with a SQL Clone and it requires any of the schema/data that was in the original copy and is not in the changes the user has made, a call is made back to the image file (the VHD mounted copy) to fetch it. When you’re in an office setting (with your cup of coffee in one hand and a colleague sat next to the other telling you about their weekend) this is fine because you’re connected directly to the company network and therefore the link between the clone on your laptop and the image file on the fileshare is short, strong and stable.

At home though this isn’t the case. Many of us work on VPNs that are “ok” on internet connections that can only be described as temperamental. So what happens when a clone tries to call back to the image file across this VPN, which is now much further away, across a sea of uncertainty and poor connection?

Bad things happen. Either the Clone cannot connect to the Image and it decides it no longer knows what it is, and it falls into Recovery Pending for a while until the connection is re-established, or if the connection is present it is just so slow.

sloth dmv GIF

This isn’t a fault of the tool, I hasten to add, it is just the nature of the technology. Much as we would expect our RDP sessions to be a little laggy if we were based in Greenland and the server was in Australia, it is just part of life.

So… are there solutions? You bet there are!

Option 1: The “jump box”

Many people I have worked with have found that they are still able to leverage SQL Clone whilst working from home by reducing the physical distance between the Clone and the Image, and have done so by introducing a Dev/Test “jump box”.

The way this works is by having an instance of SQL Server available within the company network onto which the Clones are provisioned.

This works great because it means that the link between the Clone and the Image is once again short and strong and stable, relying on the company network, but you can easily connect to or RDP onto this jump box if you need to work with them. Still using your VPN and internet from home? Check! Able to work with a Clone now though? Check!

Option 2: The Cloud

Welcome… to the world of tomorrow!

season 1 sb 129 GIF by SpongeBob SquarePants

By which of course I mean, the world that has been available to us for a while. Infrastructure as a Service (or IaaS) allows us to very easily spin up an additional dev/test server which can be used in the interim. Whilst this does incur a little extra cost in infrastructure it still means that you won’t need as much space on the VMs themselves, thank you SQL Clone!

As long as you have a fileshare available in say Azure or AWS and a Windows VM you’re pretty much good to go and once you’ve got Clones and Images up in the Cloud, you’re reliant on the networks of the providers, and I’ve got to say, they’re not too shabby at all!

Sneaky side note: ALL demos I give of SQL Clone actually run on an EC2 VM with a 91GB Image on a fileshare and it works great!

Option 3: Roll-your-own solution

Ok. I realize this one is not really a solution. But the thing we’ve highlighted here is that it’s all about the distance between the Clone and the Image. Those two love-birds cannot exist long distance.

So if you might have another ingeniously simple way of solving this, I would recommend having a read of this magnificent document written by an incredibly clever ex-colleague of mine and the “How It Works” documentation and then let me know what YOU come up with:

Links: Best Practices for SQL Provision and How It Works

Conclusion

There are a few ways of getting around this problem but they’re not always obvious or clear, and at this trying time, you just want solutions you have in place to work. We all do.

So if you are working with any of Redgate’s tools, or even if you just have a question that I or a colleague might be able to help with – please reach out to us, talking to us won’t cost you anything, but it sure as heck might gain you something!

Stay safe, stay well and have a great week!

Me on Twitter: @PlantBasedSQL
Redgate on twitter: @Redgate
Redgate Support Team: support@red-gate.com