Shall we begin? (With data classification)

“If I had an hour to solve a problem I’d spend 55 minutes thinking about the problem and 5 minutes thinking about solutions.”
Albert Einstein

So I just spent about 20 minutes trying to come up with a suitable title for this blog post, and then it struck me – one of my favorite movies of all time (which I will soon be ensuring my wife and I watch again) is Star Trek into Darkness, featuring the magnificent Benedict Cumberbatch, in which he masterfully growls the phrase “shall we begin?”.

shall we begin benedict cumberbatch GIF

This sums up perfectly where people find themselves at the beginning of their classification activities. “Where do I start?” is usually the first question I get asked – regardless if you’re using an excel sheet, Azure Data Catalog or Redgate SQL Data Catalog to carry out this process, you will find yourself in the same place, asking the same question.

Classifying your SQL Server Tables and Columns is not straight forward, I’ll say that up front – you have to be prepared for whatever may come your way – but give yourself a fighting chance! Whether you’re looking to better understand your data, protect it, or you’re just hoping to prepare your business to be more able to deal with things such as Subject Access Requests (SARs), the Right to be Forgotten *cough* I’m looking at _you_ GDPR *cough* – or even just key development in systems containing sensitive information; this is the ultimate starting point. As per my blog post on data masking here, you can’t protect what you don’t know you have.

This is my effort to give you the best possible start with your classification process, whether this feeds into a wider data lineage process, data retention or, of course, data masking. So… shall we begin?

Get a taxonomy set up

This is perhaps the most crucial part of your success. Even if you have the best classification process in the world it really means nothing if you’ve basically described your data in one of say, 3 possible ways. The thing to bear in mind before getting started is that the data cataloging process is not specific to one job.

You may think at this point in time that you’re going to use it to highlight what you want to mask for Dev/Test environments, or maybe it’s your hit list for implementing TDE or column level encryption – but this _thing_ you’re building is going to be useful for everyone.

  • DBAs will be able to use this to help them prioritize systems they look after and being more proactive when it comes to security checks or updates, backups etc.
  • Developers will be able to use this to better understand the tables and environments they are working on, helping them contextualize their work and therefore engage and work with any other teams or individuals who may be affected or who may need to be involved.
  • Governance teams and auditors will be able to use this to better understand what information is held by the business, who is responsible for keeping it up to date and how it is classified and protected.

The list goes on.

So all of the above will need to be engaged in a first run to help actually describe the data you’re working with. What do you actually care about? What do you want to know about data at a first glance? Below is the standard taxonomy that comes out of the box with Redgate’s Data Catalog:

Some of my favorites are in here, which I would encourage you to include as well! If nothing else, having Classification Scope as a category is an absolute must – but I’ll come to this soon. You can see though, how being able to include tags such as who owns the data (and is therefore in charge of keeping it up to date), what regulation(s) it falls under and even what our treatment policy is in line with any of those regulations is, gives us so much more to go on. We can be sure we are appropriately building out our defensible position.

Having a robust Taxonomy will enable you to not only know more about your data but to easily communicate and collaborate with others on the data you hold and the structure of your tables.

Decide who is in charge

This seems like an odd one, but actually one of the most common questions I get is about who will be carrying out the classification process, and this is where the true nature of collaboration within a company is going to be absolutely critical.

Some people believe that a DBA or a couple of developers will suffice but as you’ll see later on, this is not a simple process that only 1 or 2 people can handle by themselves. Be prepared to spend hours on this and actually the implementation of classification means by nature you are going to need a team effort in the first instance.

You will need representation from people who know the database structure, people who know the function of the various tables and people who know the business and how data should be protected. You will require representation on this team and the collaboration between Dev, DBAs, Testers, Governance and DevOps, and you will need someone central to coordinate this effort. When you have key representation from these teams, it will make it easier to identify and collaborate on hot spots of data, so ensure you have this knowledge up front.

Get rid of what doesn’t matter

You may be surprised that the next step is technically an execution step, but it is an important point nonetheless and will absolutely help with the classification effort. This is where the Classification Scope category comes in, and this is why it’s my favorite.

One of the biggest problems that people face when actually executing on their classification is the sheer enormity of the task. There is no “average” measure we can rely on unfortunately but even small schemas can be not insubstantial – recently, some work I did with a customer meant they provided me with just ONE of their database schemas which had well in advance of 1800 columns across dozens of tables. When you scale that same amount to potentially hundreds of databases, it will become rapidly clear that going over every single column is going to be unmanageable.

To start then, the knowledge brought by the team mentioned above will be invaluable because we’re going to need to “de-scope” everything that is not relevant to this process. It is very rare to find a company with more than 50% of columns per database which contain PII/PHI and even if you are one of those companies, this process can help you too.

There could be many reasons that something shouldn’t be included in this process. Perhaps it is an empty table that exists as part of a 3rd party database schema, such as in an ERP or CRM solution. It could be a purely system specific table that holds static/reference data or gathers application specific information. Regardless what the table is, use the knowledge the team has to quickly identify these and then assign them all with the necessary “Out of Scope” tag.

This will not only help you reduce the number of columns you’re going to need to process significantly, but will give you greater focus on what does need to be processed. One of the greatest quotes I’ve heard about this process comes from @DataMacas (a full on genius, wonderful person and someone who over the years I have aspired to learn as much from as possible) who referred to it as “moving from a battleships style approach to one more akin to minesweeper“. Which is just so incredibly accurate.

In my example database below with only 150 odd columns, using the “Empty Tables” filter, and then filtering down to system tables I know about, I was able to de-scope just under half of the database, just as a starting point:

Figure out patterns and speed up

Many of the people carrying out this process will already have some anecdotal knowledge of the database as I’ve already mentioned, but now it’s time to turn this from what _isn’t_ important, to what is.

The fastest way to do this is to build up some examples of column naming conventions you already have in place across multiple databases – there will likely be columns with names that contain things like Name, Email or SSN in some format. Both SQL Data Catalog from Redgate and Microsoft’s Azure Data Catalog have suggestions out of the box that will look at your column names and make suggestions as to what might be sensitive for you to check and accept the classification tags.

Now these suggestions are incredibly helpful but they do both have a reduced scope because they’re just matching against common types of PII, so it’s important to customize them to better reflect your own environments. You can do this fairly easily, one or both of the following ways:

1 – Customize the suggestions

In Redgate’s SQL Data Catalog you can actually, prior to even looking at the suggestions and accepting them, customize the regular expressions that are being run over the column naming convention to actually check that they are more indicative of your own schemas – either by editing existing rules or by creating your own rules and choosing which tags should be associated with the columns as a result:

You can then go through and accept these suggestions if you so wish, obviously making sure to give them a sense check first:


In both of the aforementioned solutions you can call the PowerShell API to actually carry out mass classification against columns with known formats – this will allow you to rapidly hit any known targets to further reduce the amount of time spent looking directly at columns, an example of the SQL Data Catalog PowerShell in action is the below, which will classify any columns it finds where the name is like Email but not like ID (as Primary and Foreign keys may, in most cases, fall under the de-scoping work we did above) with a tag for sensitivity and information type (full worked example here):

Finally – get classifying

This is the last stage, or the “hunt” stage. It’s time for us to get going with classifying what’s left i.e. anything that wasn’t de-scoped and wasn’t caught by your default suggestions or PowerShell rules.

You can obviously start going through each column one by one, but it makes the most sense to start by filtering down by tables which have the highest concentration of columns (i.e. the widest tables) or the columns that are specifically known as containing sensitive information (anecdotally or by # of tags) and classifying those as in or out of scope and what information they hold, who owns it and what the treatment intent is at the very least.

The approach i take in this instance is to use filtering to it’s utmost – in SQL Data Catalog we can filter by table and column names but also by Data Type. Common offenders can be found with certain NVARCHAR, XML or VARBINARY types, such as NVARCHAR(MAX) – to me, that sounds like an XML, JSON, document or free-text field which will likely contain some kind of difficult to identify but ultimately sensitive information.

Following the NVARCHAR classification I move on and look at DATETIME and INT/DECIMAL fields for any key candidates like dates when someone attended an event or even a Date of Birth field. This helps especially when the naming conventions don’t necessarily reflect the information that is stored in the column.

Finally, one thing to add is that you will need access to the databases or tables at some point. You can’t truly carry out a full-on data classification process purely against the schema, especially for the reason above. Data will often exist in places you weren’t aware of, and knowing the contents, format and sensitivity of the data can only reasonably be found and tagged if you have the support of the data team to do so.


This is not a one time thing. The initial classification is going to be the hardest part but that can be mitigated if you follow some of the processes in this post and ultimately work as a team.

Classification though, is ongoing. It is there to be an evergreen solution, something that provides context for any data governance or DevOps processes that you have in place and therefore should be maintained and treated as what it is. The place that everyone is able to use to gather information about what they will be using, and what the company might have at risk.

Saying “no” doesn’t mean you don’t care

“One of the most painfully inauthentic ways we show up in our lives sometimes is saying “yes” when we mean “no,” and saying “no” when we mean “hell yes.””
Brené Brown

As someone who loves to please others, I often find that I end up with a calendar that looks like someone is very good at playing tetris, with meetings and demonstrations and training as the blocks and my days fill up dramatically to the point where finding white space or even a lunch break is difficult to do. Sound familiar?

Let me be clear, saying “yes” is one of the most powerful statements you have to offer – it is a clear indication that you want to help someone, or that you care about what they are doing. Yes is a profoundly positive and exciting word.

But what happens when you say yes one too many times?

  • Meetings start to pile up meeting you are running from one to the other and not getting time to focus on your core responsibilities
  • Meetings start eating into your personal time “oh I was hoping to leave at X o’clock but I can probably push that a bit”… an hour… two hours…etc.
  • The quality of your work slips because you’re balancing too many plates and only able to give an ever decreasing % of your time to individual tasks
  • Conflicts occur more easily when people book in follow up meetings at the same times as other meetings you’ve said yes to
  • People’s expectation of you shifts – not knowing that you are overburdened they expect you to thrive on that sort of “busy” and will continue to come to you with requirements

The negative ramifications of saying yes can be many, which leaves us in a fundamentally twisted catch-22, where what is supposed to be your most positive word can actually harm your work, your personal relationships and even your mental health.

Now I’m not saying that you should be asserting “no” to everything someone asks, it is very much a case of how you communicate. Communication is the best way of handling these sorts of situations because, and you may forget this often (but it’s worth reminding yourself) – people are just that, people.

No one is going to fly off the handle if you tell them “I can appreciate your situation and I understand why you require me for this, however I’m currently at capacity and my current priorities are dealing with X and Y instead.” There is nothing stopping you from also suggesting other avenues they may be able to go down, or other people they may be able to speak to, to help them with their problem/meeting etc.

One of my favorite phrases in the world is “When you walk a mile in someone else’s shoes” (then you’re a mile away and you have their shoes! haha) but the key is not trying to do the same job or answer the same queries as someone, it’s merely empathizing with them and seeing it from their perspective. So when you are struggling with workload, look for proactive solutions to help, but don’t feel you have to put yourself in harms way to do so.

Conversely, if you work with someone who is constantly overburdened or chasing their own tail – consider, what can you do to help them? If you have additional overhead, be proactive and see if there’s anything you can do to share problems or thoughts.

A phrase I always hear when working with people to implement DevOps processes for their databases is “I simply don’t have time for this at the moment.“, despite the fact that the very solutions we’re discussing – whether they are related to tooling or processes– will save them time and help them get a handle on their current and future workloads. In many situations, this can be simply addressed by breaking down some of the silos they’re experiencing in their company, and asking for help.

The top 3 tips I can offer to people who are afraid of saying no, or who flat out refuse to do so are this:

  1. Communicate. Tell people you feel overwhelmed, help them understand why your current load is too much so they understand what you’re up against.
  2. Ask for help. Many consider this to be a sign of weakness, but the most successful people aren’t the one’s who just do everything by themselves, they are the ones who know how to tap into the empathy, expertise and knowledge of those around them.
  3. Manage your time better. Revisit your priority list and identify what you can delegate, or ask for help on. Make time for yourself to catch up, to do admin work or to blog, for instance! Definitely make sure you take time for things (even block them out in your calendar provisionally) like lunches and leaving on time because allowing your brain to break away from the constant pace of everyday life when it’s not overwhelmed inspires creativity and problem solving, and will actually better prepare you for the challenges ahead.

So no. Saying “no” does not mean you don’t care, in fact it means the opposite, you do care – but you’re prioritizing, doing your best to ensure the demands of the job are met, but most importantly you’re putting your own health first.

Stop calling yourself an idiot.

“Be you, love you. All ways, always.”
Alexandra Elle

You’ve probably heard it a lot; in the workplace, at home, from yourself. The dreaded phrase “oh sorry, I’m just being an idiot.” It doesn’t always have to be “idiot”, it can be “moron”, “I’m being stupid”… the list goes on and on.

I’m going to say something you may or may not like, whether you do or not is irrelevant as it does not diminish it’s truth.

You are not an idiot. Nor have you ever been one.

Many people use this phrase to excuse mistakes they make or to emphasize that they know better but had followed a gut instinct to do something forgetting the best or ‘correct’ way of doing it, but that is neither stupidity nor idiocy. It is being human.

I work with so many intensely clever people, not just when it comes to knowing about DevOps, or knowing about SQL Server, but about things in general. Regardless of what the thing you know is, you know something and in many cases more about it than many people around you and you should take pride in that. I’m not by any means insisting that you should be arrogant or full of yourself, but you should be confident about the things you know and the experiences you have had that got you to that point.

Thinking less of yourself for simple mistakes (and that’s all they are, small, easily rectified things) is damaging not only to other people’s perceptions of you but it’s how you are reinforcing your negative perception of yourself. The more you repeat this to yourself like a mantra you undermine the self belief and self love you have for yourself, you are making a very simple but very effective statement to the world that you are not worthy.

Do you really believe that? If so, then it is time for introspection and a more fundamental soul searching exercise to lead yourself to acceptance and contentment. My feeling is though that 99% of people reading this will know that they are worthy, both of the love of other people as well as the love of themselves.

It is deeply rooted in the language we use and is an observed behavior that we grow up with and adopt into our own personal idiosyncrasies, so it is time to change up the language we use about ourselves. Take each mistake or negative feeling you have about your own knowledge, observations and/or performance and simply change the way you describe it to yourself, which can have a huge impact on how you remember and feel about that event. Challenge the use of negative terminology and use updated and positive self-affirming phrasing – you can find some great examples of this here:–talk

I’ll give you a key example as I am very guilty of doing this myself, in the hope’s that giving a personal context will allow you to more easily identify where you can give yourself some more love. Yesterday I had a meeting with the wonderful Kendra Little (who I have already spoken about a number of times on here, but yet again she comes to the rescue) where we were discussing an upcoming webinar that we’ll be conducting together. I asked Kendra for some additional time for us to sync up later in the week so we could best discuss the format for the webinar, do a run through and (ad verbatim):

“I need to know roughly when each of us should be talking, because whilst I would naturally be more quiet and let the super-expert speak, but I don’t want to come across as the creepy guy who joins a webinar and sits there in silence not contributing anything for an hour.”

Can you see what was wrong with that? The language I used to immediate diminish my own value, without even being conscious of it at the time?

Quite rightly, I was met with silence on Kendra’s part which was immediately followed up with: “Chris. You just managed to describe all of the key benefits of this model over the more traditional single models in detail, in a way that people will understand. I don’t think you have anything to worry about.”

That stuck with me all evening and on reflection on how I spoke about myself I realize how right Kendra is. I am here for a reason, I was invited to participate in the webinar for a reason, and people care what I have to say.

So take some time for you, take a good hard look at how you speak about yourself, your accomplishments and your mistakes and realize, you are anything but an idiot. You are wonderful.

The best vegan salted caramel apple and pear crumble/crisp I ever did see!

Pull up a chair. Take a taste. Come join us. Life is so endlessly delicious.”
Ruth Reichl

Imma stop you right now. I know how this goes, you open a blog post about food and you expect 3 things. A story about why I made it, the recipe and pictures.

I have no pictures. We ate it. I’m sorry, I couldn’t wait.

There’s no story either. My mum and step-dad were coming round for dinner last night (we made a roasted cauliflower, new potato and mushroom curry which was insanely good – yay, go my wife for being an awesome cook!) but we needed a dessert.

I found a couple of recipes I liked but each had elements of the other I wanted to include. So I made my own! Specific thanks to Bosh! for the Salted Caramel recipe in particular from their UK show “Living on the Veg“.

So without further ado, or description of the weather outside or why I love comfort food. Here’s a recipe you should try!

For the filling:

  • 1 x Pink Lady apple
  • 1 x cooking apple
  • 1 x braeburn apple
  • 3 x rocha pears
  • 3 tablespoons maple syrup
  • 2 teaspoons cinnamon
  • Pinch of Salt
  • 1 teaspoon coconut oil

For the salted caramel:

  • 100g pitted dates (Medjool desirable)
  • 120ml hot water
  • Large pinch flaked sea salt

For the crumble/crisp topping:

  • 75g vegan butter / coconut oil
  • 50g Soft Brown Sugar
  • 115g all-purpose flour
  • 1 heaped tablespoons ground almonds
  • 1 teaspoon cinnamon
  • 1 small handful rolled oats


  1. Pre-heat oven to 180 C (350 F)
  2. Peel, core and chop apples and pears into cubes, about the size of a large thumbnail
  3. Melt the coconut oil in a pan over a medium heat and then add the apples, pears, salt, cinnamon and maple syrup then cook for 10 minutes until soft
  4. Whilst the fruit is cooking add the topping ingredients to a bowl and (using your hands) combine into a breadcrumb style texture
  5. Add the dates, water and salt to a blender / nutribullet and blend until smooth
  6. Take the fruit off of the heat and stir in the date caramel mixture to combine then pour contents into an over proof dish
  7. Top with the crumble mixture and lightly press down to ensure all gaps are filled
  8. Bake uncovered for 30 minutes then serve piping hot with vegan custard or cream-alternative


Accountability, not blame

“When you blame and criticize others, you are avoiding some truth about yourself.”
Deepak Chopra

Last week, I didn’t blog.

Why? Well there is no real excuse, I had opportunities to and I prioritized other things around and over it which led to a simple and inescapable fact: for the first time this year, I didn’t blog.

Every day I speak to people about development processes and adopting more agile methodologies; utilizing source control effectively and deploying better code more frequently. One of the biggest wins in this process is the level of accountability managers and teams are able to adopt. You can see exactly who has been doing what, when and why and this allows you to communicate more effectively, share ideas and ultimately deliver more value to the people consuming your end product.

One thing that always gets me though is when people, and there are many, talk about the ability to catch bad code, and holding developers accountable for their actions. Some people see this for what it really is, a feedback loop. Feedback is the single greatest thing one can receive on any of life’s paths; whether this be feedback on an essay you wrote, or on your personal tone and manners in social situations. The goal of feedback is for you to learn, to adapt, to grow.

But some people see it as a ‘blame game’, holding people bang-to-rights and using feedback to attack, rather than nurture. Trying to adapt what is primarily a process for growth and turning it into an opportunity to escape accountability for your own actions and to make another feel inferior as a by-product is a clear indicator of someone who is, as Deepak says, avoiding a deep and potentially unsettling truth about themselves.

This style of blame is akin to bullying, emotional abuse either within or outside the workplace and is catagorically not the style of accountability (if you can call it that) you or anyone should be adopting and crucially…

If you see or hear anyone giving ‘feedback’ like this, take the opportunity to speak to that person (when possible) and explain why what they are doing is so destructive.

The best way to tackle destructive negative feedback is with constructive feedback and understanding.

So no, I didn’t blog. Do i feel bad about this? Well, perhaps in some way – but crucially, the reason i didn’t blog is that I consciously prioritized other things in my life that required my attention, and I’m glad I did. The other things that I’m working on, both personally and professionally, are making me happy and fulfilled. Blogging will come with time and there will be weeks where I am not able to, but I will catch up as priorities shift themselves back to normal, stable, day-to-day levels.

I have learned from this week that there are better ways to organize my time, and this is important feedback to give myself to ensure in the future I am able to get through everything I would like to and to still have time for me, so it has been a very important learning curve.

This week, my wife and I are on holiday and I have made the decision to turn on my out-of-office and turn off work-email notifications and I will be using this time instead to focus on the one person in the world who makes me feel like nothing can stop me, who’s always there with a thoughtful, beautiful smile – and that is the very least she deserves.

I am accountable to myself to ensure I am doing the things I must, and behaving as I should – but I refuse to blame myself for taking time out for my mental health, for taking opportunities to grow… and for not prioritizing blogging.

I hope you all have a wonderful week and that your 2020 is off to the most wonderful start as we leave a cold and dark January behind us and I’ll see you back here for more of what you’ve seen so far and more!

P.S. £10 for the missed week takes the total for the ‘2020 blogging challenge’ donation to MIND up to £130 in December, a happy product of holding myself accountable.

Provisioning local Dev copies from AWS RDS (SQL Server)

“It’s still magic even if you know how it’s done.”
Terry Pratchett

For a long time now I have worked heavily on Redgate’s SQL Provision software. I was involved from the very beginning when SQL Clone was but a few months old and just finding it’s little sheepish (teehee) legs in the world and before Redgate Data Masker was “a thing”.

The one thing I’ve never comprehensively been able to help a customer with though, was PaaS. Platform as a Service has been something that has plagued my time with this wonderful software and that is because you simply cannot take an Image (one of SQL Clones VHDX files) from an Azure SQL Database or from an Amazon AWS RDS Instance directly, helpfully.

But then in January 2019 I did some research and I wrote this article on how you could achieve Provisioning from Azure, through the BACPAC file export method, this was great and several customers decided this method was good enough for them and have adopted it, and in fact completely PowerShell-ed out the process (links to something similar in my GitHub which I used for my PASS Summit Demo 2019), however this never solved my AWS problem.

I’ll be the first to admit, I didn’t even try. AWS for me was “here be dragons” and I was a complete n00b; I didn’t even know what the dashboard would look like! However, in early December 2019 I was on a call with a customer who mentioned that they would like to Provision directly from RDS SQL Server and they don’t want any “additional hops” like the BACPAC Azure method. On the same day, Kendra Little (sorry Kendra, you seem to be the hero of most of my blogs!) shared some insight that it was possible, with AWS, to output .bak files directly to an S3 bucket. That got me thinking, if we can get access to a .bak file directly from S3, surely we could provision it all the way to dev with little- to-no involvement in the process?

My reaction to this news was that it was the universe telling me to get off my backside and to do some thing about it, so with renewed determination and looking a little bit like this:

Ready Lets Go GIF by Leroy Patterson

I set off into the world of AWS.

1 – Setup

Now naturally, I am not a company. Shock. So i don’t have any pre-existing infrastructure available in AWS for me to tinker with, and that was the first challenge. “Can I use anything in AWS for free?” – The answer? Actually, yes! AWS has a free tier for people like myself who are reeeeeally stingy curious which at the very least will let me better understand how to interact with the various moving parts for this.

First step. I’m going to need a Database in RDS, so I went to my trusty DMDatabase (scripts here for y’all) which I use for EVERYTHING, on-premise, in Azure, EV-ERY-THING.

In AWS I went to RDS and setup a SQL Server Express instance called dmdatabaseprod (which fortunately kept it on the free tier). Luckily, AWS provides an easy getting started tutorial for this which you can find here – why re-invent the wheel? After creating the DB I had some major issues actually connecting to it in SQL Server Management Studio; I thought I had allowed all the correct ports for traffic, put it in the right security groups blah blah blah… and guess what it was?

Public accessibility. Was set. To “No“. *cough* well that problem was sorted quickly so it was onto the next challenge.

2 – Backing up to an S3 bucket

I can take no credit for this step whatsoever because it was the wonderful Josh Burns Tech who saved me. He created a video showing exactly what I wanted to do and you can see this, with full instructions and scripts here:

After following the advice of Josh and walking through his steps, getting a new S3 bucket setup and configured and creating a new backup of my DMDatabase, I was a good step of the way there! As you can see my .bak was nicely sat in my S3 bucket – marvelous!

3 – Making the S3 bucket visible to SQL Server

This was the tricky bit. My approach to solving this problem was “I need SQL Server to be able to see the .bak file to be able to create an image and clones from it. So, logically, I need it to be mapped as a network drive of some kind?” – simple, no? It turns out that it was the best approach from what I found online but there were a number of ways I found of tackling it.

I started out using this article from Built With Cloud which was super informative and helpful, I managed to get rClone running and the S3 bucket was showing as a local drive, which was exactly what I wanted:

But I ran into a problem – SQL Server could not access the mapped drive.

So is there another way? I found a bunch of resources online for CloudBerry, TnT Drive and MountainDuck but, like I mentioned I’m on a very limited budget ($0) so naturally… I put this on twitter. I received a tonne of replies giving some examples and some ideas and the one idea that kept coming up time and time again was AWS Storage Gateway. I had never heard of it, nor did I have any idea of how it worked.

So. Back to Google (or in my case Ecosia, it’s a search engine that plants trees if you search with them, what’s not to love???)

To simplify it. Storage Gateway is solution that is deployed “on-premise” i.e. as a hardware gateway appliance or a virtual machine, and it allows you to effectively use your S3 (or other AWS cloud storage service) locally by acting as the middle-person between AWS and your on-premise systems, and it does fancy local caching which means super low latency network and disk performance. There are a few different types you can utilize but for this exercise I went with “File Gateway”, from Amazon: “A File Gateway is a type of Storage Gateway used to integrate your existing on-premise application with the Amazon S3. It provides NFS (Network File System) and SMB (Server Message Block) access to data in S3 for any workloads that require working with objects.”

Sounds ideal. Time to set it up!

I have access to VMWare Workstation Pro on my machine so I downloaded the OVF template for VMWare ESXi and loaded it up in VMWare (the default username and password threw me a little but it turns out it’s admin and password as standard, and you can change it as you configure):

Then it was a bit of a checkbox exercise from there:

Now I wasn’t 101% sure of exactly how best to set up my fancy new Gateway, so fortunately I found this super helpful video, funnily enough from Teach Me Cloud as opposed to the aforementioned Built With Cloud, and although it was a little out of date, I also had one of Redgate’s finest engineers (the wonderful Nick) on hand to help out. Between the video and us (mostly Nick) we were able to get everything connected!

But I ran into the same problem. SQL Server couldn’t access the backup file.

angry hate GIF

Fortunately though, after some frantic Googling we managed to find a very straightforward article that fixed all of our pain! We needed to map the drive in SQL Server itself – Thanks Ahmad! Now, yes, I did use XP_CMDSHELL (insert DBAReactions gif at the mere mention of it here) but this was for testing purposes anyway, I’m sure there are other ways to get around this problem!

…and guess what? It worked. Huzzah!

If you can substitute my poorly named image “blah” and assume instead it says “HeyLookImAnImageFromAnRDSBackupFileArentIClever“, this means I can now schedule my PowerShell process to create new images at the beginning of every week to refresh my DMDatabase Clone environment, no manual steps needed!


Whilst there are a number of steps involved, you can easily take advantage of some of the fantastic features offered by AWS like Storage Gateway and even if your database is hosted in RDS, you can fully provision copies back into IaaS (Infrastructure as a Service) or On-Premise workflows to keep your Pre-Production copies up to date and useful in development!

Just remember to mask it too!

P.S. I’m sure you could probably find some clever way of using the free rClone method I also mentioned and having this readable by SQL Server, but I haven’t figured it out yet, but will blog when I do!

DBAle hits 5000 listens

We see our customers as invited guests to a party, and we are the hosts. It’s our job every day to make every important aspect of the customer experience a little bit better.”
Jeff Bezos

The date is June 6th 2018. Our producer Louise has just setup a slack channel called #dbale with the topic: “Hi, thought i’d just set this up as it might be the easiest way to chat about the podcast idea at this early stage.” purely off the back of a conversation we had after a practice run for this webinar where I suggested to Chris and Lou that I loved working with the 2 of them, and that we should do some kind of podcast together.

We produced our first DBAle episode and it went live around the 12th July 2018, the production quality was very low, like abysmal and we didn’t really know what we were doing, but it was up. It had it’s own landing page, we were on Spotify and Google Play Music. It was “a thing”.

Little did we know though, it started something. I will never try to quantify that something because it was a weird something, something that could only be produced by 2 Chris’ getting in front of a microphone, drinking a beer and talking about data.

There are a hundred different podcasts out there that are super good if you’re looking to learn more about data like SQL Data Partners or Dear SQL DBA and we wanted to be a bit different. So fortunately people are able to join us on our own journey of discovery and also have a laugh on the way!

Fortunately, as a podcast backed by Redgate we get the added benefit of having access to some superstars as well, like episode 17 with the incredible Steve Jones, or the live episode with the incredible Chris Yates. I have to admit I’m very excited for the next few episodes for this very reason! Look out for February’s episode as it’s sure to be wonderful, and hopefully we’ll even persuade none other than Kendra Little herself to join us soon! (Hint, Kendra, hint!!)

I guess this blog post wasn’t to give insight into how to sustain a podcast, keep motivated or even what the formula for success is, as I’m not sure we know any of these (at least 1 point would be invest in a good microphone, we found that out the hard way). But there is one thing I do know, we keep doing what we love and at time of writing we are at 5,300 unique downloads, so I wanted to take the opportunity to say thank you.

Thank you to everyone who has supported us, thank you to everyone who has been on or helped produce the show, and a massive thank you to everyone who has listened.

Here’s to the next 5000 listens, a year of fun, beer and friendship… oh and data. Here’s a picture of the crate of beer and scratch off bucket list Redgate (thanks Louise!) got us for hitting the 5,000 mark:

P.S. Everyone asks me what my favorite episode of DBAle was so far, so if you’re interested then check out these two (as I couldn’t pick!)

Servers that go bump in the night(time):

Monitor the state of the 8%:

Using things weirdly – Part 1: The SQL Clone Config Database

“If the challenge exists, so must the solution.”
Rona Mlnarik

I like to use things in the way they weren’t intended. It leads to behaviors you weren’t expecting… and that’s pretty fun.

Often though, when you work for a software company, people want your solutions to do things they just were not designed to do, or have no capabilities to do at that exact point in time. The answer then that will often come back to you is a resounding: “I’m sorry, that’s not available at this time.” of course that is if you’re lucky enough to get an answer at all.

It’s no secret though, none of us who answer those questions like saying that. There is nothing worse than not being able to help people with what they’re trying to do.

Now don’t get me wrong – some people’s questions should be challenged. You cannot immediately assume that the person you’re talking to has all of the available variables, or has thought of any alternative means of achieving their end goal – which is why it is always important to ask ‘the why behind the why’. Every question has at least one immediate answer:

“I am going to buy a car.”
“Because I want to drive to work.”

This seems like a perfectly reasonable response. But what if we haven’t thought about every available option here? I can take this at face value and sell you a car, but i haven’t asked where you live and work; perhaps you live within the congestion zone in London where driving is impractical. Do you actually have off-street parking available? Will you need the car for other trips or just for work? With a little additional questioning and thinking we might establish that you live right near a London Underground station and it would be more cost effective and slightly faster to instead commute by Tube.

This is not just a sales tactic, rather one that should be used to better understand ever requirement, personal or professional – there is no substitute for good understanding.

But there are times when a perfectly reasonable request is made with good justification and unfortunately you’re still unable to help. But sometimes there are still ways to help, even if it doesn’t natively fall within the remit of the solution you’re working with. I had one such question recently:

“I would like to receive a list of Clones per image that have been created, which are older than 2 weeks and who created them, so I know which developers to nudge to refresh their Cloned environments and so we can create new images.”

Again this is a perfectly reasonable request, and the obvious answer was, well, “check the dashboard”, there’s a full list there with all of that information. When you have a good many developers all using SQL Clone though, it can get a little difficult to scroll through everything and identify the culprits. SQL Clone has no native alerting or reporting (as of writing) so it was assumed that was the best we could do.

Like I said though, I like to use things as they weren’t intended. Tee-hee.

When you install SQL Clone it creates a back-end SQLClone_Config database on an instance of your choosing (normally where you’ve installed the solution) for recording operations carried out within the software. This database is structured really, really logically that anyone can understand it and leverage it to meet their requirements.

By thinking a little more about their requirement we realized that it would actually be possible to leverage the database by running a very small snippet of T-SQL which could show us:

  • The name of the Clone
  • Who created it
  • Which Image it belongs to
  • When it was created and,
  • How old it is

This was as straightforward as:

SELECT C.Name                                AS CloneName
     , C.CreatedBy                           AS CloneOwner
     , I.FriendlyName                        AS BelongsToImage
     , O.Timestamp                           AS CreatedOn
     , DATEDIFF(DAY, O.Timestamp, GETDATE()) AS AgeOfCloneInDays
FROM dbo.Clones              C
    LEFT JOIN dbo.Images     I
        ON I.Id = C.ParentImageId
    LEFT JOIN dbo.Operations O
        ON I.Id = O.ImageId
WHERE O.Type = 1
      AND O.Timestamp < DATEADD(DAY, -14, GETDATE())
      AND C.DeletedOn IS NULL;

This gave us exactly what we needed:

It was just that simple. But we didn’t want to leave it there! So we set up an email to be sent to the heads of each development team from a SQL Server Agent Job with the output of this query each Monday morning so they could raise it with the relevant members just after stand up. Now I had never done this before so of course who would come to my rescue but the wonderful, knowledgeable (and just all round awesome-sauce-person) Jes Schultz with this post from 2014!

And with that – we used something that isn’t exactly supposed to be used in this way, but got something valuable up and running for their teams. It’s a simple answer, but it was quick, and it worked.

What’s a quick win you’ve had recently?

Not letting stress take over, give yourself a moment

“The time to relax is when you don’t have time for it.”
Sydney J. Harris

I work in a bustling, high-pressure environment. As a Sales engineer I could be called on to do, well, pretty much anything at the last minute – and particularly at the end of quarters when there are big pushes to try and hit various targets you never know quite what you’ll be doing day to day. Will I be on site with a customer this week? Will I be doing 6 product demonstrations in a single day or one long 3-hour remote troubleshooting session?

It could be anything.

It makes my job exciting and I love the prospect of having to be on my metaphorical toes, but for me and those around me it can be exhausting. Yes it’s exciting, but never getting what I like to call “work down time” i.e. time you can use to learn something knew, tinker with a problem you’ve been thinking about for a while, can become detrimental to your mental state as the pressure starts to build.

This problem isn’t restricted to a sales environment – it can occur anywhere there is a high pressure workload, deadlines or unpredictability.

Stress is a hormonal response from the body. Adrenaline and cortisol (and others) force your body into this “ready” state where you’re constantly ready to fight or flee and it is a state that should be reserved for occasions we require it. To be in a high pressure, high stress job where you constantly feel worn out, over worked and anxious for what the day holds in store can be not only problematic for your workload as you try in vain to keep up with everything (and potentially let standards slip) but it can also have big ramifications for your health, including (but not limited to):

  • Less and/or worse quality sleep
  • High blood pressure
  • Heart problems
  • Skin irritation
  • Anxiety
  • Headaches / Migraines

It’s obvious when you’re giving in to stress because you start making excuses. When we’re most stressed that’s when we find ourselves identifying ways to put off tackling the thing that is causing us the issue(s) or normalizing and rationalizing the problem. We’ve all been students at one point, putting off working for deadlines “well, I can pretty much get it done next week I’m sure” and even now as adults we start figuring out how much time we can sacrifice around it “well if I come in at 5am and just crack on with it, because no one else will be in the office…“. This is just another way for stressful activities to play on your mind and eat into our personal time and even our sleep. But that stuff is muy importante and actually, you don’t have to put yourself through that; many people consider stress to be a normal part of the job they lead like prison guards, astronauts and doctors. The key is to use stress to your advantage, be focused on the job at hand, but don’t let it overwhelm you, whatever it is you do.

Stress has always been something I’ve had difficulties overcoming and it wasn’t until 2019 when my wife and I ran an Action For Happiness “Exploring What Matters” course (check here for any courses running near you – they’re super cool!) that I realized I didn’t have to be a slave to stress.

There are so many coping techniques for stress but I wanted to just share 1 with you today (and maybe others in the future), but this is a technique I discovered in that Exploring What Matters course that you can put into action right now.

It has long been proven that meditation can have incredible health benefits for those who practice it, but the common feedback I hear on it is “but I don’t have time to meditate in the middle of the day! I have a job to do!” – whilst this may be true, meditation doesn’t just have to be sitting in a quiet room, cross-legged saying “hummmmmm” whilst sniffing incense for an hour until you find inner peace.

The video below will walk you through taking just a moment in the middle of your day to re-focus, to help you deal with stress. Sometimes we carry stress with us from call to call or meeting to meeting and all it can take is for us to deal with that build up to prevent it from affecting us and our work. I loved the course because it made me look at stress for what it was – not a big ball of mess that I had to carry everywhere with me and could do nothing about – rather, something I could choose not to feel if i didn’t want to.

I hope this video helps you as much as it’s helped me.

One thing I will say in closing though, and that is if you find stress is a big part of your daily life and it makes you agitated, anxious and weary, meditation might not be enough to help you get through. Stress can be like a big heavy ball you constantly feel is hanging from your neck, pulling you down and restricting your airways. However, things can change and you can change them. Speak to your boss, your friends and family, even a therapist about what is stressing you out; they may hold the key to help you unlock the root of the stress and therein lies the way to releasing it.

You are not alone, ever.

“Where do I even begin with data masking?” Getting started in 3 steps.

“I like to begin where winds shake the first branch.”
Odysseus Elytis

I’ve already covered data masking a little bit with a few of my first posts on this blog, and for many it will not be new to them. Performance tweaks and determinism are just par for the course in masking data, approaches to solving a well defined problem. But for many, the challenge is not how they mask their data. You’re free to write scripts, PowerShell or use a COTS software solution like Data Masker and the end result is going to be a set of data which you can securely work on; knowing that if you are going to suffer a breach of that data, it will be useless to those who get their hands on it.

The biggest problem here though, is getting started.

That phrase in itself suggests what kind of problem data masking is – anything in this life should be easy to at least get started at, with it becoming more and more complex the more you get into it. One can write a Hello World program quite simply in any major programming language but to start creating rich, functional graphical software, you’re just going to have to learn, write and test (repeat infinitely).

But data masking is like playing ‘capture the flag’ without an opponent team. Capturing the flag is a straight-forward enough objective, but if you don’t know where the flag is, how big it is, what it looks like, how many flags their are etc. then your job is going to be impossible if you just jump straight into it. Like all things in this life, data masking takes careful thought and planning to understand the scale you’re dealing with – and you need to make sure you don’t close your scope too soon by focusing on one system or set of data.

So here are the 3 broad steps I would recommend when starting to protect data in non-production environments:

1 – Catalog your structured data

I cannot emphasize this point enough. Before you do anything, catalog the heck out of your estate. Use anything you like and whilst i’d recommend approaches such as this or this, even an excel sheet is a start. You must know what you hold, everywhere, before you can begin to make decisions about your data. Perhaps the goal of this was to mask the data in pre-Production copies of your CRM system? Well, data doesn’t just exist there in isolation… or maybe it does! You don’t know until you have a record of everything. Every column, every table, every database, every server. There should be a record, tagged with a reasonable value to indicate at the very least the following 4 things:

  1. What system it is used in
  2. Who is in charge of the stewardship of it (i.e. who should be keeping it up to date / ensuring compliance)
  3. How sensitive it is
  4. What kind of data it is

Obviously other values can be specified, like your treatment intent (masking, encryption, both etc.) or your retention policy for specific fields, even something like identifying how the data moves in-out-and-around your estate (data lineage) but at the very least it needs to highlight where the hot-spots of sensitive data exist and what is most at risk. From this you can derive the insight needed to start masking data.

Notice this step is before any specific masking steps, that is because we should have a company wide resource to reference our structured data regardless.

2 – Scope your attack, identify the ‘hit points’

Once you have a record of everything you can begin with the systems that are going to be the most at risk / which databases you should start with. As part of the new project you’re working on, you may need to mask multiple tables or even databases but the size and scale of this process is currently stopping you proceeding.

Identify the key tables or even databases you will need to start with – it is so rare to come across databases where sensitive information is perfectly evenly spread across every table (I’m sure there are some, of course, but many will have concentrations of data) – these ‘usual suspects’ will be in contact, account, address, payment info tables etc. and where we have multiple tables with duplicated information, we want to select the biggest behemoths for this task. The reason for this is that instead of trying to mask all tables equally, you’ll want to mask the biggest sources of truth and then “fan out” your masking, linking them back to the masked tables. Not only is this process quicker but it also allows our masking to be consistent and believable. Draw out diagrams top we can keep the actual masking to a minimum (i.e. just these hit points) and then identify where you can fan this out to. Use these almost as you would stories in development.

This approach keeps the effort of writing scripts of configuring masking rules to a minimum but keeps the impact high – preventing us from wasting time trying to mask a table which, for instance, may only hold Personally Identifiable Information by correlation with other data. Ultimately, once you have this (now hopefully smaller, more focused) list, it’ll be easier to define how you want the masked data to appear.

3 – Write. Test. Validate. Record.

Once you have an understanding of your estate and an understanding of what you’re going to be masking, it’s time to start getting some indicative masked values mapped across (names for names, dates for date of birth etc.) but this is not a big bang approach.

Just like software development this process is best done iteratively. If you try to write one masking script or rule set you will gain 2 things:

  1. A thing that only you know how it works
  2. A thing that only you can maintain

Start with a copy of the database. Restore it to somewhere locked down that only a very select few have access to. Build the first part of your script to reflect the first of the ‘stories’ you identified in 2). i.e. “Mask the names to be something believable”. Does this work?

No, why not?
Yes, perfect.

You should even ask for feedback on the masked data from developers, they will be working with it so it makes sense to ensure you collaborate with them on what they would like to see.

So it’s time to record what you’ve done, just as anyone would do with any other software process. If only there was some way for us to do this… oh wait!


It doesn’t matter if you’ve written a script or created part of a Data Masker masking set, it should now be put into source control, properly commented, and stored so that it is easily accessed, updated and implemented. Grant Fritchey actually wrote a great article about doing this here!

Now build up those rules, tackle your stories and gradually try to create enough to mask each of the ‘hit points’ and then fan them out to the secondary tables. Once you have kept this effort minimal but high impact, you’ll be able to try this in earnest. Once you have tackled your high risk targets then you can add stories to add specific test cases or oddities required in Dev and Test.

The point is to start somewhere, like I said, getting started is the hard part, but once you know what you have, where it is and how it’s structured, how you actually mask the data is a breeze.