“[My] kids haven’t responded to my GDPR requests so I don’t think I’m legally allowed to tell them when dinner’s on the table.” – @mrdaveturner
Ah masking. You would have thought I’d be sick of it by now, no? No, fortunately now, more so than ever, I find myself answering question after question and tackling use-case after use-case. So when I was asked this week:
I thought to myself, well that sounds easy enough… and it was! I know what you’re thinking, c’mon Chris, surely there is more to it? But no, it’s actually pretty straight forward!
I pointed them at the PowerShell module and cmdlets for SQL Provision and the Azure DevOps plugin to automate all of their Provisioning and Masking process, thinking all the while “pffft, they could have made this harder!” and then…
#1Figure out where you want Data Masking to run in your process
This empty Azure deployment stage looks good enough for now! If you wanted to chain other processes either side of it, that’s cool too! Maybe you have your own provisioning process in place and you want to point Data Masker at it to sanitize it? Makes sense to me! For now I’m going to stick with a single agent job for simplicity.
#2 Figure out what is actually going to run Data Masker
Data Masker is a client install and as such will need to be installed on a *gasp*actual machine!
No but seriously, any server you have lying around, physical or VM will do the trick as long as it meets these requirements. Now this Server/VM will need to have an Azure DevOps agent on it already, which of course is the ideal candidate for being the “thing” that calls Data Masker – this could be the Staging/Non-Functional/Pre-Prod environment also of course, so you could copy down PROD and then immediately invoke masking.
The PARFILE you could of course dynamically replace with variables so that it only calls the relevant parameter file for that particular database as well, a nice benefit!
My PARFILE just simply looked like this:
It was calling a local Data Masker set “AzureFun” – now the thing to bear in mind is that Data Masker will run with the Windows authentication credentials that are being run as by the Azure DevOps agent, unless you specify otherwise. In this case because the Azure DevOps agent has the correct permissions to update the databases on this instance anyway I’m fine to use Windows Authentication:
It’s very easy to simply call the command line of Data Masker for SQL Server directly from Azure DevOps, does this same approach work from other CI/CD tools? If they can call executables on the target server then absolutely! So it’s very easily included in the process – you just have to think about where Data Masker is installed and what credentials you’re using for it!
Bonus Point – what about if it’s all Azure SQL Database?
You had to do it didn’t you, you had to say it!
“But Chris, now we know we can call this all from Azure DevOps, what if wewanted to mask and copy Azure SQL Databases into Dev/Test etc.?”
Well actually the good thing is, it’s also pretty similar! When you’re connecting Data Masker to an Azure SQL DB you only need to specify this in the connections in the controller. Again, authentication will likely have to be SQL Auth at this point, and you need to be in Cloud mode, and I’d recommend setting the connection timeout to 10s rather than the standard 5s, but it can still be called as normal from the PARFILE:
So the Data Masker element is reasonably straight forward – that’s the good news. But the thing you REALLY need to stop and think about is:
Where are our Dev and Test copies going to BE?
Option #1: If they’re going to be on VMs or local dev and test servers / developer machines then you could follow a similar approach to one I laid out in this blog post for Redgate in which you create a BACPAC file and split it out on premise before importing it and then provisioning from there. And you could use this code in my Github to achieve something very similar. Caveat: I am no PowerShell guru, who do you think I am? Rob Sewell? Chrissy LeMaire? No. Sadly not. So you can build your own logic around my code though, have at it, I don’t mind! ^_^
Option #2: Keeping everything in Azure. You can copy databases around in Azure and it seems to work pretty well! So I wrote this PowerShell (also in my GitHub for y’all) to effectively copy a PROD DB into the same resource group, mask it and then copy it across to a Dev/Test resource group, dropping the temp copy so as not to incur lots of extra Azure costs (this is just one of the methods I’ve seen people use, again it’s up to you!) – again, see the caveat in option #1 above for my statement on PowerShell! The good thing is, you can use the ‘&’ simply from PowerShell to call Data Masker’s command line.
Either of these options can be run from Azure DevOps also as part of your provisioning or working processes, but instead of including a call to the command line, you can run a fun PowerShell script instead:
Second Conclusion *sigh*
There are lots of ways to get what you need into Dev and Test, but these copies should be masked if they contain personal, identifying information. There are some methods above but there are plenty of others out there on the internet and if you’re not sure about getting started with data masking; try my post here – happy masking!
“That’s who you really like. The people you can think out loud in front of.” – John Green
Can we all take a moment to appreciate how awesome Kendra Little is? No really, go on over to her Twitter or something and remind her. Because not only is she a genius but she brings out the best in a lot of folk, and I don’t think she gets enough credit – so my quote above today is for her!
This is one of those times though, where we stumbled on an idea, and together we fleshed it out and thanks to her ingenuity and straight up desire to help we ended up with a full on video about it! Thank you Kendra! So if you don’t want to sit here and read about Reusable Schema Deployments, take a look at the video below instead where we cover everything from the key differences between State and Migrations, what a filter file is and how to use YAML in Azure DevOps!
For those of you who prefer a nice read, grab a coffee or tea and a biscuit (cookie) and read on!
We find ourselves in the unenviable situation where we have a Production database that is delivered to customers to support different applications that we provide to them. When a new customer comes on board and chooses any number of our products, we then deliver them along with a copy of the database containing a set of objects that are specific to their setup.
Example. We produce 3 applications for our customers; Beep, Derp and Doink. In the database that we supply with these applications, we have corresponding schemas ‘Beep’, ‘Derp’ and ‘Doink’, as well as ‘dbo’ which holds a number of objects common across all instances of the database.
The question then is: “How do we deploy only the combinations of these schemas to the relevant customers AND make sure there is as little down time as possible?”
I mean, besides magic, of course!
Solution 1: Less Work, More Schema
There’s a reason why, when you buy a new ERP or CRM system for your company, many times you will receive ALL of the schema! Just bought in a new system to help manage your General Ledger, Accounts Payable and Accounts Receivable, and those are the only modules you’re going to use?
Whooooo-boy-howdy you better believe you’re going to get schema objects for Asset Management, Billing and Risk Management too!
The reason for this is that it is much easier to deliver. It is a single package that needs to be deployed everywhere and if the customer already has the relevant objects in the database then it is MUCH easier to just turn on corresponding application functionality that starts to populate and use those objects!
The problem is, if EVERY customer get’s EVERY object change across EVERY schema… well then it’d be a lot of changes and potentially some quite big changes that could impact our customers, perhaps unnecessarily. This could easily be an argument to be made for the migrations approach to source controlling and deploying changes, but that’s one for another day!
Solution 2: You get a filter, and you get a filter, EVERYBODY gets filters!
In the state based way of deploying we can actually use filter files (.scpf) which allow us to filter at the point of creating a database release artifact. This is a game changer because that means we can have the convenience of developing against a single source database in Dev, source controlling ALL object changes together and it’s only once we actually get to the point of deploying to the customer do we include the filter file in the Create Release Artifact step to include ONLY the necessary schema objects that are relevant to them.
Now this is also a great way of doing it because it means that everything in source control is still our single source of truth and we’re able to validate that everything builds together and we can run unit tests broadly against all objects etc. however it does also mean that we have to either maintain separate filter files for every customer, or for every combination of our Schemas that a customer could receive and update them as and when people add or remove certain applications. It also doesn’t give us any validation that THIS particular release artifact that has been created for that customer actually works independently from the rest of the schema objects and therefore we’re deploying something that hasn’t actually been tested in isolation first!
Finally, the secondary problem with this approach is that it is SLOOOOOOW. Like super slow. This is because the heaviest part of the state based database deployment process is the creation of the release artifact determining what changes should be included in the release that is going out of the door and this is being carried out and putting overhead independently on every. single. customer. Not fun.
Solution 3: Reduce, Reuse, Recycle.
If we take a step back from this problem and look at exactly what we’re trying to do and what we want to do. We want to deliver ONLY the necessary changes to a particular schema in the database that supports that specific application.
But this means that there is a commonality across customers – if for example we assume that we have 30 customers that have a variation of the Beep schema (I’m going to ignore dbo for now because everybody has that), they may also have Derp, or Doink or no other schemas, but the point is all 30 of those customers will require the exact same updates to their Beep schema full stop.
This means, if we can generate a single artifact once, that can be used for all 30 customers receiving this schema, or indeed ANY schema, then we can:
a) Reduce the amount of comparisons taking place to create release artifacts b) Reuse and Recycle the same release artifacts for multiple customers c) TEST these artifacts before they actually go out the door!
This is effectively achieved by adding an additional layer on top of the development and deployment process:
An additional step is introduced to produce a single reusable artifact prior to the final Prod deployment, Pre-Prod all receives the same package which contains every object regardless of schema, however when a Production release needs to go out the release artifact is built against the Beep database (in this case, which only has the Beep and dbo schemas) so the pain of the creation of the artifact actually sits outside of the customer environment AND is created only once, allowing us to now distribute that change to any customer who many require it to upgrade their Beep schema.
The same is done for each schema in turn which means we then deploy the fast reusable artifacts, and the only process change required is the step immediately before deploying to Production, almost like the independent databases are an exact mirror.
Don’t get me wrong there are challenges with this model as well.
What if we want to deploy a completely new database with a set number of schemas? Well. You may have to do an ad-hoc deployment or add an additional process to the pile which does that create for you!
What if we create versions 10.4.1, 10.4.2 but these only make it up to one of these Pre-Prod mirrors and then we want to push 10.4.3 to Customer Production environments? We will no longer receive the artifacts from .1 and .2, only .3! Here you will have to create the specific filtered artifacts ONLY prior to deploying to Production so that EVERY change is captured. In the video above I used a Golden DB which had every deployment on it and used this to test my schema specific deployments prior to deployment but it depends on what setup you want to adopt.
Filters are incredibly powerful and if you have subtle differences in hosted environments across your customers or even across your own DBs they can be an ingenious way of being able to keep all core objects AND key variations within Dev, version control and Pre-Production but then making sure that the target ONLY receives what it needs.
But be aware, subtle variations can snowball and you do have to be careful how you handle them as it is not very easy to scale to multiple customers. Fortunately in this scenario 1 schema was mapped to 1 application being delivered which makes it easy to determine who gets what, but the more differences you have, the harder they will be to continuously integrate and deliver.
But it’s happening more and more now. We have to work from home and this is starting to turn up some problems.
And no. I don’t just mean that my wife is now acutely and accurately aware of how annoying I am.
I’m talking about clones. Specifically SQL Clone, and clones do not work great from home.
Now for anyone out there who has never heard of SQL Clone, where have you been?? It’s an incredibly intuitive, reliable tool for rapid provisioning of database copies (*cough* definitely not just lifted from the website) – in any case, it’s pretty darn cool.
One of the coolest things about this technology though is how it seamlessly plugs into the Microsoft Ecosystem by leveraging the VHD(X) technology available in x64 Windows. This means there’s no special file systems, no hard or software “appliances”, it’s very much plug and play (with a little tinkering around ports 14145 and 14146 of course!)
Naturally though it does come with it’s challenges, as does all technology, and it has been highlighted recently more so than ever.
Whilst SQL Clone is a beautiful, elegant and easily automated answer to the database provisioning problem – there are 2 gotcha’s that you must be aware of when you start using the solution. SQL Clone relies on the relationship between the Image file (the centralized, virtualized parent, as it were that exists on a windows fileshare) and the Clones themselves (the 40mb diff disks on the hosts).
Now many people choose to put Clones onto their workstations, which for many of us is our laptops. We have SQL Server Developer installed and we pull down a few clones for different branches, for testing etc. etc. and all is well with the world.
When you’re in the office that is.
When a user queries or works with a SQL Clone and it requires any of the schema/data that was in the original copy and is not in the changes the user has made, a call is made back to the image file (the VHD mounted copy) to fetch it. When you’re in an office setting (with your cup of coffee in one hand and a colleague sat next to the other telling you about their weekend) this is fine because you’re connected directly to the company network and therefore the link between the clone on your laptop and the image file on the fileshare is short, strong and stable.
At home though this isn’t the case. Many of us work on VPNs that are “ok” on internet connections that can only be described as temperamental. So what happens when a clone tries to call back to the image file across this VPN, which is now much further away, across a sea of uncertainty and poor connection?
Bad things happen. Either the Clone cannot connect to the Image and it decides it no longer knows what it is, and it falls into Recovery Pending for a while until the connection is re-established, or if the connection is present it is just so slow.
This isn’t a fault of the tool, I hasten to add, it is just the nature of the technology. Much as we would expect our RDP sessions to be a little laggy if we were based in Greenland and the server was in Australia, it is just part of life.
So… are there solutions? You bet there are!
Option 1: The “jump box”
Many people I have worked with have found that they are still able to leverage SQL Clone whilst working from home by reducing the physical distance between the Clone and the Image, and have done so by introducing a Dev/Test “jump box”.
The way this works is by having an instance of SQL Server available within the company network onto which the Clones are provisioned.
This works great because it means that the link between the Clone and the Image is once again short and strong and stable, relying on the company network, but you can easily connect to or RDP onto this jump box if you need to work with them. Still using your VPN and internet from home? Check! Able to work with a Clone now though? Check!
Option 2: The Cloud
Welcome… to the world of tomorrow!
By which of course I mean, the world that has been available to us for a while. Infrastructure as a Service (or IaaS) allows us to very easily spin up an additional dev/test server which can be used in the interim. Whilst this does incur a little extra cost in infrastructure it still means that you won’t need as much space on the VMs themselves, thank you SQL Clone!
As long as you have a fileshare available in say Azure or AWS and a Windows VM you’re pretty much good to go and once you’ve got Clones and Images up in the Cloud, you’re reliant on the networks of the providers, and I’ve got to say, they’re not too shabby at all!
Sneaky side note: ALL demos I give of SQL Clone actually run on an EC2 VM with a 91GB Image on a fileshare and it works great!
Option 3: Roll-your-own solution
Ok. I realize this one is not really a solution. But the thing we’ve highlighted here is that it’s all about the distance between the Clone and the Image. Those two love-birds cannot exist long distance.
So if you might have another ingeniously simple way of solving this, I would recommend having a read of this magnificent document written by an incredibly clever ex-colleague of mine and the “How It Works” documentation and then let me know what YOU come up with:
There are a few ways of getting around this problem but they’re not always obvious or clear, and at this trying time, you just want solutions you have in place to work. We all do.
So if you are working with any of Redgate’s tools, or even if you just have a question that I or a colleague might be able to help with – please reach out to us, talking to us won’t cost you anything, but it sure as heck might gain you something!
“One cannot think well, love well, sleep well, if one has not dined well.” – Virginia Woolf
Working from home these past couple weeks has been, well… different.
At this point we’ve gotten used to working next to one another on the dining room table (HUGE shout out to my wife for putting up with me), we’ve blasted through a lot of movies across Disney+, Netflix and Amazon Prime (we’re well stocked) but most importantly… we’ve cooked.
Now this blog was never intended to be a food-y cook-y recipe blog by any means, stay tuned there because there is something much more exciting in the works in that arena!!!
However there have been some recipes that have just been an absolute blessing to have in these times. The ingredients for them are reasonably simple, can be substituted fairly easily where necessary and they are thoroughly nourishing (and normally last us a good period of time too!)
I really enjoy this frittata, we use the base recipe for the “egg” replacement mixture and it’s delicious – a variation we make in our house though involves cooking off 2 sweet potatoes in salt, pepper, oil and some smoked paprika, then adding red bell pepper, and Violife Vegan ‘Feta’, and instead of a skillet, using a baking dish. Honestly discovering this recipe was one of the best finds I ever made – yum yum yum!
Good for: Saturday or Sunday mornings as it is easily thrown together, and the recipe is easily doubled so you can save the cold leftovers (which are also great) for a quick lunch/brunch later in the week
This one is self-explanatory. It’s wonderful, warming, quick, easy and delicious and if you don’t already follow the Happy Pear where have you been??? Check this one out because we are already looking forward to making it again. We didn’t have sweetcorn so we subbed in Kidney Beans and we added nutritional yeast to the cashew ‘cheese’ for an extra cheesy recipe 😉
Good for: A very quick and healthy midweek dinner, fed 3 but could easily feed 4 or even 5!
This combination of Sweet Potato and Chickpeas should be illegal, it tastes JUST SO DARN GOOD. I have nothing else to add other than dig out your slow cooker and get going!
Good for: Any weekday dinner, the amount made will easily server 4 over two nights if served with rice or couscous etc. especially handy because you prep it and leave it in the slow cooker whilst you work. Minimum hassle.
Ok this one is a cop out. Buddha bowls are by far the easiest thing to make on this list because you basically throw a bunch of things in a bowl and it’s nourishing, delicious and a bit of fun! The basic “formula” for a Buddha bowl though can be seen here on Eating Well.
Good for: Fast, healthy meals that can be changed up or perfected to your taste. There are so many permutations that there’s no real limit to what you can have. Really great also when you have young-ish kids who want to help and can pick and assist with the cooking of the various ingredients!
So we missed off a treat that you can make easily, everything else just seems a bit… healthy. Well this is still kinda healthy but comes with a beautiful, rich and very easy to devour chocolate filling that involves very few ingredients. All you need is a fridge and a short list of ingredients to create a deliciously indulgent vegan dessert.
Good for: Treats, birthdays, celebrations even under lock-down, anniversaries, romance or simply because it’s Tuesday. This dessert can be made by pretty much anyone on the go, and you can make small versions in little pots too if you want to spread the goodness out, or keep little ones busy!
So, these are my top 5 lock-down / isolation recipes that are easy to make, easy to have fun with and VERY easy to eat! What’re your go-to recipes?
“What you stay focused on will grow.” – Roy T. Bennett
As of Monday 16th March 2020 I have been a remote worker.
Ever since I left University I have worked in busy, bustling offices where the air runs thick with collaboration, questions and social commentary.
But now… I’m at home. I guess you could say I’m not quite sure how to come to terms with this. It certainly is strange knowing that my commute in the morning has gone from 50 minutes on the bus to 50 seconds from bedroom to dining room (via the kitchen for coffee). It’s even stranger though that I don’t get to see colleagues; friends, who I’ve worked with for years and who’s smiling faces, cheerful demeanor and indomitable spirits have been huge contributing factors to my desire to tackle the working day and make as much of it as i possibly can.
In the past remote working was simply the odd days I worked here and there from home, with nothing but my laptop because it was a quieter day when I could do some learning as well as some other life task like going to the doctor or dropping the car off to be serviced – therein lies the problem.
I had come to associate working from home with quieter periods of time, when I could focus on 1 thing at a time, maybe work from the sofa and treat myself to the occasional snack.
As of Monday however, the entire office has been locked down and we are _all_ working from home… for how long? I don’t think anybody knows. The only certainty in the world right now is that nothing is certain – so it is time to put into practice something that I always preach, but rarely am pushed to exercise. I will have to change my mindset.
I’m not alone in this at all – you may be reading this thinking “but Chris, working from home is so easy, I actually get more done and I’m more focused!” and if that is the case I applaud you, and I wish I could share those same abilities. However if you’re like me and suddenly working from home by directive has jarred you, here are my top 3 tips for working from home that have helped me get to grips with it and maintain my productivity:
1- Find a routine
This is perhaps the most important on my personal list because in the past I have found myself getting out of bed at 7.30/8am on days where I worked from home. This is ~2 hours after I would normally get up when commuting into the office and whilst this sort of lay in is great at the weekend to get some rest, it also leads to me being groggy and not fully awake when your start your morning meetings/calls/work and doesn’t help you focus or build a reasonable mental list of priorities.
Take aspects of your normal daily routine and replace them so that you build up a Monday-Friday (for example) that represents something similar to the structure you enjoyed when working from the office. Here is an example of how I have changed my schedule to adapt to my new working situation; I would normally catch the bus to the office which would take anywhere between 40 and 50 minutes first thing in the morning. Now at exactly the same time in the morning I go for a 30-40 minute brisk walk to simulate that commute – match something you did to something creative, compelling, healthy or otherwise you can do from home instead.
The thing to bear in mind is that finishing work for the day should also factor into your plan. It is very very easy when one works from home, to simply leave the laptop open or code running or join a late call. These things will rapidly eat into your personal life though so once you hit your magic “clocking off” time… clock off.
2 – Create separation between work and home
Even though they are now one and the same, you have to keep a separation between your work and home lives. I’ve heard it described by many people that they have set up in the study or in the office – but what do you do when you live in a small flat? Or if you live with your parents who are _also_ working from home and you have to say, work in your bedroom?
My solution to this, as I have a similar problem, is to make certain touches to transform myself and the space i’m working in to make it feel as though there is a transition between going from home to work.
Once I have come back from my walk I relax and make a coffee and some toast or cereal whilst watching YouTube videos, then at 7.55am precisely, my Amazon Echo buzzes an alarm, I turn the television off and I begin “building” my work space. A process which involves taking out the monitor and accompanying cables, setting them up on the dining room table, neatly arranging where everything should go and plugging in my phone and finally ensuring that I have a full bottle of water on standby to stay hydrated.
Conversely at the end of the day I will go through the same ritual in reverse to the point where we are all able to sit around the dining room table and have dinner with no trace that I was ever there working. This act of transformation somehow makes the space seem different, and gives it a different energy, which means i’m not thinking about work problems whilst I eat with my family.
3 – Communicate, Communicate, Communicate
Ok. This one is a no-brainer, but there are subtle levels to communication that can really help when you’re feeling the weight of working from home hovering above you like so much foreboding.
Whenever you’re feeling the pinch of distraction or you’re lacking motivation, message a colleague you would normally speak to on a daily basis at the office and ask them how they are. It seems simple but if you rotate who this person is and just check in on them you make them feel though about, cared about and appreciated. Not only does this give them the motivation to carry on but is a selfless act that can also revamp your own spirits and the resulting conversation can even inspire the thoughts you need to help with whatever you’re working on.
Turn your webcam on – be seen and see other people and enjoy the smiles and the thinking and the body language you would otherwise be missing out on.
React and be reacted to -if you use Teams or Slack or any kind of collaboration tool, make sure that when people make statements or ask questions, you either offer feedback or at least react. A thumbs up emoji at least lets people know that you are listening, allowing them to feel validated and supported and not think they’re just shouting into the void.
Learn to use a mute button – we have all been in situations where someone has mouth breathed the meeting to postponement or someone has had a coughing fit or their child(ren) have come in asking questions – all of these are perfectly fine because we’re humans… but. That is no excuse for everyone in the meeting internal or external to hear that. Liberally use your mute button on your meeting software or headset and make sure you are able to contribute meaningfully, but that when you’re not contributing, you’re also not hindering others from doing so.
Right now is a difficult time. Very much so – and I hope you’re all doing well and staying healthy and happy. Working from home makes up a small portion of what everyone is feeling and struggling against at this point in time and so we should remember to be mindful and grateful that we even are a part of companies who are able to support us working from home as the ability to even do so is a blessing.
Remember to take your mindset and find a way to turn each of the things you struggle with into positives; recognize them, appreciate them and ask yourself why you’re feeling that way, and then find a way to deal with it that makes you more productive and/or happier – most companies have a wealth of people who are experienced in or who are dedicated to help you working from home, don’t be afraid to ask for help if you need it and see if someone can help you tackle some of the problems you’re facing. Everyone is different and will need different things, just because you don’t feel comfortable working from home, for any reason, doesn’t mean you’re doing things wrong. We’re all learning together.
So stay safe, stay healthy and stay awesome and I’ll leave you with my favorite tweet so far on the matter:
P.S. We also just recorded a DBAle Special Episode on working from home, where all podcast participants were working from their respective homes, if you’re interested, you’ll be able to listen on Spotify, Apple Music and directly here: https://www.red-gate.com/hub/events/entrypage/dbalewhen it goes live in the next few days 🙂
“If I had an hour to solve a problem I’d spend 55 minutes thinking about the problem and 5 minutes thinking about solutions.” – Albert Einstein
So I just spent about 20 minutes trying to come up with a suitable title for this blog post, and then it struck me – one of my favorite movies of all time (which I will soon be ensuring my wife and I watch again) is Star Trek into Darkness, featuring the magnificent Benedict Cumberbatch, in which he masterfully growls the phrase “shall we begin?”.
This sums up perfectly where people find themselves at the beginning of their classification activities. “Where do I start?” is usually the first question I get asked – regardless if you’re using an excel sheet, Azure Data Catalog or Redgate SQL Data Catalog to carry out this process, you will find yourself in the same place, asking the same question.
Classifying your SQL Server Tables and Columns is not straight forward, I’ll say that up front – you have to be prepared for whatever may come your way – but give yourself a fighting chance! Whether you’re looking to better understand your data, protect it, or you’re just hoping to prepare your business to be more able to deal with things such as Subject Access Requests (SARs), the Right to be Forgotten *cough* I’m looking at _you_ GDPR *cough* – or even just key development in systems containing sensitive information; this is the ultimate starting point. As per my blog post on data masking here, you can’t protect what you don’t know you have.
This is my effort to give you the best possible start with your classification process, whether this feeds into a wider data lineage process, data retention or, of course, data masking. So… shall we begin?
Get a taxonomy set up
This is perhaps the most crucial part of your success. Even if you have the best classification process in the world it really means nothing if you’ve basically described your data in one of say, 3 possible ways. The thing to bear in mind before getting started is that the data cataloging process is not specific to one job.
You may think at this point in time that you’re going to use it to highlight what you want to mask for Dev/Test environments, or maybe it’s your hit list for implementing TDE or column level encryption – but this _thing_ you’re building is going to be useful for everyone.
DBAs will be able to use this to help them prioritize systems they look after and being more proactive when it comes to security checks or updates, backups etc.
Developers will be able to use this to better understand the tables and environments they are working on, helping them contextualize their work and therefore engage and work with any other teams or individuals who may be affected or who may need to be involved.
Governance teams and auditors will be able to use this to better understand what information is held by the business, who is responsible for keeping it up to date and how it is classified and protected.
The list goes on.
So all of the above will need to be engaged in a first run to help actually describe the data you’re working with. What do you actually care about? What do you want to know about data at a first glance? Below is the standard taxonomy that comes out of the box with Redgate’s Data Catalog:
Some of my favorites are in here, which I would encourage you to include as well! If nothing else, having Classification Scope as a category is an absolute must – but I’ll come to this soon. You can see though, how being able to include tags such as who owns the data (and is therefore in charge of keeping it up to date), what regulation(s) it falls under and even what our treatment policy is in line with any of those regulations is, gives us so much more to go on. We can be sure we are appropriately building out our defensible position.
Having a robust Taxonomy will enable you to not only know more about your data but to easily communicate and collaborate with others on the data you hold and the structure of your tables.
Decide who is in charge
This seems like an odd one, but actually one of the most common questions I get is about who will be carrying out the classification process, and this is where the true nature of collaboration within a company is going to be absolutely critical.
Some people believe that a DBA or a couple of developers will suffice but as you’ll see later on, this is not a simple process that only 1 or 2 people can handle by themselves. Be prepared to spend hours on this and actually the implementation of classification means by nature you are going to need a team effort in the first instance.
You will need representation from people who know the database structure, people who know the function of the various tables and people who know the business and how data should be protected. You will require representation on this team and the collaboration between Dev, DBAs, Testers, Governance and DevOps, and you will need someone central to coordinate this effort. When you have key representation from these teams, it will make it easier to identify and collaborate on hot spots of data, so ensure you have this knowledge up front.
Get rid of what doesn’t matter
You may be surprised that the next step is technically an execution step, but it is an important point nonetheless and will absolutely help with the classification effort. This is where the Classification Scope category comes in, and this is why it’s my favorite.
One of the biggest problems that people face when actually executing on their classification is the sheer enormity of the task. There is no “average” measure we can rely on unfortunately but even small schemas can be not insubstantial – recently, some work I did with a customer meant they provided me with just ONE of their database schemas which had well in advance of 1800 columns across dozens of tables. When you scale that same amount to potentially hundreds of databases, it will become rapidly clear that going over every single column is going to be unmanageable.
To start then, the knowledge brought by the team mentioned above will be invaluable because we’re going to need to “de-scope” everything that is not relevant to this process. It is very rare to find a company with more than 50% of columns per database which contain PII/PHI and even if you are one of those companies, this process can help you too.
There could be many reasons that something shouldn’t be included in this process. Perhaps it is an empty table that exists as part of a 3rd party database schema, such as in an ERP or CRM solution. It could be a purely system specific table that holds static/reference data or gathers application specific information. Regardless what the table is, use the knowledge the team has to quickly identify these and then assign them all with the necessary “Out of Scope” tag.
This will not only help you reduce the number of columns you’re going to need to process significantly, but will give you greater focus on what does need to be processed. One of the greatest quotes I’ve heard about this process comes from @DataMacas (a full on genius, wonderful person and someone who over the years I have aspired to learn as much from as possible) who referred to it as “moving from a battleships style approach to one more akin to minesweeper“. Which is just so incredibly accurate.
In my example database below with only 150 odd columns, using the “Empty Tables” filter, and then filtering down to system tables I know about, I was able to de-scope just under half of the database, just as a starting point:
Figure out patterns and speed up
Many of the people carrying out this process will already have some anecdotal knowledge of the database as I’ve already mentioned, but now it’s time to turn this from what _isn’t_ important, to what is.
The fastest way to do this is to build up some examples of column naming conventions you already have in place across multiple databases – there will likely be columns with names that contain things like Name, Email or SSN in some format. Both SQL Data Catalog from Redgate and Microsoft’s Azure Data Catalog have suggestions out of the box that will look at your column names and make suggestions as to what might be sensitive for you to check and accept the classification tags.
Now these suggestions are incredibly helpful but they do both have a reduced scope because they’re just matching against common types of PII, so it’s important to customize them to better reflect your own environments. You can do this fairly easily, one or both of the following ways:
1 – Customize the suggestions
In Redgate’s SQL Data Catalog you can actually, prior to even looking at the suggestions and accepting them, customize the regular expressions that are being run over the column naming convention to actually check that they are more indicative of your own schemas – either by editing existing rules or by creating your own rules and choosing which tags should be associated with the columns as a result:
You can then go through and accept these suggestions if you so wish, obviously making sure to give them a sense check first:
2 – POWER ALL THE SHELL
In both of the aforementioned solutions you can call the PowerShell API to actually carry out mass classification against columns with known formats – this will allow you to rapidly hit any known targets to further reduce the amount of time spent looking directly at columns, an example of the SQL Data Catalog PowerShell in action is the below, which will classify any columns it finds where the name is like Email but not like ID (as Primary and Foreign keys may, in most cases, fall under the de-scoping work we did above) with a tag for sensitivity and information type (full worked example here):
Finally – get classifying
This is the last stage, or the “hunt” stage. It’s time for us to get going with classifying what’s left i.e. anything that wasn’t de-scoped and wasn’t caught by your default suggestions or PowerShell rules.
You can obviously start going through each column one by one, but it makes the most sense to start by filtering down by tables which have the highest concentration of columns (i.e. the widest tables) or the columns that are specifically known as containing sensitive information (anecdotally or by # of tags) and classifying those as in or out of scope and what information they hold, who owns it and what the treatment intent is at the very least.
The approach i take in this instance is to use filtering to it’s utmost – in SQL Data Catalog we can filter by table and column names but also by Data Type. Common offenders can be found with certain NVARCHAR, XML or VARBINARY types, such as NVARCHAR(MAX) – to me, that sounds like an XML, JSON, document or free-text field which will likely contain some kind of difficult to identify but ultimately sensitive information.
Following the NVARCHAR classification I move on and look at DATETIME and INT/DECIMAL fields for any key candidates like dates when someone attended an event or even a Date of Birth field. This helps especially when the naming conventions don’t necessarily reflect the information that is stored in the column.
Finally, one thing to add is that you will need access to the databases or tables at some point. You can’t truly carry out a full-on data classification process purely against the schema, especially for the reason above. Data will often exist in places you weren’t aware of, and knowing the contents, format and sensitivity of the data can only reasonably be found and tagged if you have the support of the data team to do so.
This is not a one time thing. The initial classification is going to be the hardest part but that can be mitigated if you follow some of the processes in this post and ultimately work as a team.
Classification though, is ongoing. It is there to be an evergreen solution, something that provides context for any data governance or DevOps processes that you have in place and therefore should be maintained and treated as what it is. The place that everyone is able to use to gather information about what they will be using, and what the company might have at risk.
“One of the most painfully inauthentic ways we show up in our lives sometimes is saying “yes” when we mean “no,” and saying “no” when we mean “hell yes.”” – Brené Brown
As someone who loves to please others, I often find that I end up with a calendar that looks like someone is very good at playing tetris, with meetings and demonstrations and training as the blocks and my days fill up dramatically to the point where finding white space or even a lunch break is difficult to do. Sound familiar?
Let me be clear, saying “yes” is one of the most powerful statements you have to offer – it is a clear indication that you want to help someone, or that you care about what they are doing. Yes is a profoundly positive and exciting word.
But what happens when you say yes one too many times?
Meetings start to pile up meeting you are running from one to the other and not getting time to focus on your core responsibilities
Meetings start eating into your personal time “oh I was hoping to leave at X o’clock but I can probably push that a bit”… an hour… two hours…etc.
The quality of your work slips because you’re balancing too many plates and only able to give an ever decreasing % of your time to individual tasks
Conflicts occur more easily when people book in follow up meetings at the same times as other meetings you’ve said yes to
People’s expectation of you shifts – not knowing that you are overburdened they expect you to thrive on that sort of “busy” and will continue to come to you with requirements
The negative ramifications of saying yes can be many, which leaves us in a fundamentally twisted catch-22, where what is supposed to be your most positive word can actually harm your work, your personal relationships and even your mental health.
Now I’m not saying that you should be asserting “no” to everything someone asks, it is very much a case of how you communicate. Communication is the best way of handling these sorts of situations because, and you may forget this often (but it’s worth reminding yourself) – people are just that, people.
No one is going to fly off the handle if you tell them “I can appreciate your situation and I understand why you require me for this, however I’m currently at capacity and my current priorities are dealing with X and Y instead.” There is nothing stopping you from also suggesting other avenues they may be able to go down, or other people they may be able to speak to, to help them with their problem/meeting etc.
One of my favorite phrases in the world is “When you walk a mile in someone else’s shoes” (then you’re a mile away and you have their shoes! haha) but the key is not trying to do the same job or answer the same queries as someone, it’s merely empathizing with them and seeing it from their perspective. So when you are struggling with workload, look for proactive solutions to help, but don’t feel you have to put yourself in harms way to do so.
Conversely, if you work with someone who is constantly overburdened or chasing their own tail – consider, what can you do to help them? If you have additional overhead, be proactive and see if there’s anything you can do to share problems or thoughts.
A phrase I always hear when working with people to implement DevOps processes for their databases is “I simply don’t have time for this at the moment.“, despite the fact that the very solutions we’re discussing – whether they are related to tooling or processes– will save them time and help them get a handle on their current and future workloads. In many situations, this can be simply addressed by breaking down some of the silos they’re experiencing in their company, and asking for help.
The top 3 tips I can offer to people who are afraid of saying no, or who flat out refuse to do so are this:
Communicate. Tell people you feel overwhelmed, help them understand why your current load is too much so they understand what you’re up against.
Ask for help. Many consider this to be a sign of weakness, but the most successful people aren’t the one’s who just do everything by themselves, they are the ones who know how to tap into the empathy, expertise and knowledge of those around them.
Manage your time better. Revisit your priority list and identify what you can delegate, or ask for help on. Make time for yourself to catch up, to do admin work or to blog, for instance! Definitely make sure you take time for things (even block them out in your calendar provisionally) like lunches and leaving on time because allowing your brain to break away from the constant pace of everyday life when it’s not overwhelmed inspires creativity and problem solving, and will actually better prepare you for the challenges ahead.
So no. Saying “no” does not mean you don’t care, in fact it means the opposite, you do care – but you’re prioritizing, doing your best to ensure the demands of the job are met, but most importantly you’re putting your own health first.
“Be you, love you. All ways, always.” – Alexandra Elle
You’ve probably heard it a lot; in the workplace, at home, from yourself. The dreaded phrase “oh sorry, I’m just being an idiot.” It doesn’t always have to be “idiot”, it can be “moron”, “I’m being stupid”… the list goes on and on.
I’m going to say something you may or may not like, whether you do or not is irrelevant as it does not diminish it’s truth.
You are not an idiot. Nor have you ever been one.
Many people use this phrase to excuse mistakes they make or to emphasize that they know better but had followed a gut instinct to do something forgetting the best or ‘correct’ way of doing it, but that is neither stupidity nor idiocy. It is being human.
I work with so many intensely clever people, not just when it comes to knowing about DevOps, or knowing about SQL Server, but about things in general. Regardless of what the thing you know is, you know something and in many cases more about it than many people around you and you should take pride in that. I’m not by any means insisting that you should be arrogant or full of yourself, but you should be confident about the things you know and the experiences you have had that got you to that point.
Thinking less of yourself for simple mistakes (and that’s all they are, small, easily rectified things) is damaging not only to other people’s perceptions of you but it’s how you are reinforcing your negative perception of yourself. The more you repeat this to yourself like a mantra you undermine the self belief and self love you have for yourself, you are making a very simple but very effective statement to the world that you are not worthy.
Do you really believe that? If so, then it is time for introspection and a more fundamental soul searching exercise to lead yourself to acceptance and contentment. My feeling is though that 99% of people reading this will know that they are worthy, both of the love of other people as well as the love of themselves.
It is deeply rooted in the language we use and is an observed behavior that we grow up with and adopt into our own personal idiosyncrasies, so it is time to change up the language we use about ourselves. Take each mistake or negative feeling you have about your own knowledge, observations and/or performance and simply change the way you describe it to yourself, which can have a huge impact on how you remember and feel about that event. Challenge the use of negative terminology and use updated and positive self-affirming phrasing – you can find some great examples of this here: https://www.healthline.com/health/positive-self-talk#examples-of-positive-self–talk
I’ll give you a key example as I am very guilty of doing this myself, in the hope’s that giving a personal context will allow you to more easily identify where you can give yourself some more love. Yesterday I had a meeting with the wonderful Kendra Little (who I have already spoken about a number of times on here, but yet again she comes to the rescue) where we were discussing an upcoming webinar that we’ll be conducting together. I asked Kendra for some additional time for us to sync up later in the week so we could best discuss the format for the webinar, do a run through and (ad verbatim):
“I need to know roughly when each of us should be talking, because whilst I would naturally be more quiet and let the super-expert speak, but I don’t want to come across as the creepy guy who joins a webinar and sits there in silence not contributing anything for an hour.”
Can you see what was wrong with that? The language I used to immediate diminish my own value, without even being conscious of it at the time?
Quite rightly, I was met with silence on Kendra’s part which was immediately followed up with: “Chris. You just managed to describe all of the key benefits of this model over the more traditional single models in detail, in a way that people will understand. I don’t think you have anything to worry about.”
That stuck with me all evening and on reflection on how I spoke about myself I realize how right Kendra is. I am here for a reason, I was invited to participate in the webinar for a reason, and people care what I have to say.
So take some time for you, take a good hard look at how you speak about yourself, your accomplishments and your mistakes and realize, you are anything but an idiot. You are wonderful.
“Pull up a chair. Take a taste. Come join us. Life is so endlessly delicious.” – Ruth Reichl
Imma stop you right now. I know how this goes, you open a blog post about food and you expect 3 things. A story about why I made it, the recipe and pictures.
I have no pictures. We ate it. I’m sorry, I couldn’t wait.
There’s no story either. My mum and step-dad were coming round for dinner last night (we made a roasted cauliflower, new potato and mushroom curry which was insanely good – yay, go my wife for being an awesome cook!) but we needed a dessert.
I found a couple of recipes I liked but each had elements of the other I wanted to include. So I made my own! Specific thanks to Bosh! for the Salted Caramel recipe in particular from their UK show “Living on the Veg“.
So without further ado, or description of the weather outside or why I love comfort food. Here’s a recipe you should try!
For the filling:
1 x Pink Lady apple
1 x cooking apple
1 x braeburn apple
3 x rocha pears
3 tablespoons maple syrup
2 teaspoons cinnamon
Pinch of Salt
1 teaspoon coconut oil
For the salted caramel:
100g pitted dates (Medjool desirable)
120ml hot water
Large pinch flaked sea salt
For the crumble/crisp topping:
75g vegan butter / coconut oil
50g Soft Brown Sugar
115g all-purpose flour
1 heaped tablespoons ground almonds
1 teaspoon cinnamon
1 small handful rolled oats
Pre-heat oven to 180 C (350 F)
Peel, core and chop apples and pears into cubes, about the size of a large thumbnail
Melt the coconut oil in a pan over a medium heat and then add the apples, pears, salt, cinnamon and maple syrup then cook for 10 minutes until soft
Whilst the fruit is cooking add the topping ingredients to a bowl and (using your hands) combine into a breadcrumb style texture
Add the dates, water and salt to a blender / nutribullet and blend until smooth
Take the fruit off of the heat and stir in the date caramel mixture to combine then pour contents into an over proof dish
Top with the crumble mixture and lightly press down to ensure all gaps are filled
Bake uncovered for 30 minutes then serve piping hot with vegan custard or cream-alternative
“When you blame and criticize others, you are avoiding some truth about yourself.” – Deepak Chopra
Last week, I didn’t blog.
Why? Well there is no real excuse, I had opportunities to and I prioritized other things around and over it which led to a simple and inescapable fact: for the first time this year, I didn’t blog.
Every day I speak to people about development processes and adopting more agile methodologies; utilizing source control effectively and deploying better code more frequently. One of the biggest wins in this process is the level of accountability managers and teams are able to adopt. You can see exactly who has been doing what, when and why and this allows you to communicate more effectively, share ideas and ultimately deliver more value to the people consuming your end product.
One thing that always gets me though is when people, and there are many, talk about the ability to catch bad code, and holding developers accountable for their actions. Some people see this for what it really is, afeedback loop. Feedback is the single greatest thing one can receive on any of life’s paths; whether this be feedback on an essay you wrote, or on your personal tone and manners in social situations. The goal of feedback is for you to learn, to adapt, to grow.
But some people see it as a ‘blame game’, holding people bang-to-rights and using feedback to attack, rather than nurture. Trying to adapt what is primarily a process for growth and turning it into an opportunity to escape accountability for your own actions and to make another feel inferior as a by-product is a clear indicator of someone who is, as Deepak says, avoiding a deep and potentially unsettling truth about themselves.
This style of blame is akin to bullying, emotional abuse either within or outside the workplace and is catagorically not the style of accountability (if you can call it that) you or anyone should be adopting and crucially…
If you see or hear anyone giving ‘feedback’ like this, take the opportunity to speak to that person (when possible) and explain why what they are doing is so destructive.
The best way to tackle destructive negative feedback is with constructive feedback and understanding.
So no, I didn’t blog. Do i feel bad about this? Well, perhaps in some way – but crucially, the reason i didn’t blog is that I consciously prioritized other things in my life that required my attention, and I’m glad I did. The other things that I’m working on, both personally and professionally, are making me happy and fulfilled. Blogging will come with time and there will be weeks where I am not able to, but I will catch up as priorities shift themselves back to normal, stable, day-to-day levels.
I have learned from this week that there are better ways to organize my time, and this is important feedback to give myself to ensure in the future I am able to get through everything I would like to and to still have time for me, so it has been a very important learning curve.
This week, my wife and I are on holiday and I have made the decision to turn on my out-of-office and turn off work-email notifications and I will be using this time instead to focus on the one person in the world who makes me feel like nothing can stop me, who’s always there with a thoughtful, beautiful smile – and that is the very least she deserves.
I am accountable to myself to ensure I am doing the things I must, and behaving as I should – but I refuse to blame myself for taking time out for my mental health, for taking opportunities to grow… and for not prioritizing blogging.
I hope you all have a wonderful week and that your 2020 is off to the most wonderful start as we leave a cold and dark January behind us and I’ll see you back here for more of what you’ve seen so far and more!
P.S. £10 for the missed week takes the total for the ‘2020 blogging challenge’ donation to MIND up to £130 in December, a happy product of holding myself accountable.