Which database source control model works best for me?

“Destiny is not a matter of chance; it is a matter of choice. It is not a thing to be waited for, it is a thing to be achieved.”
William Jennings Bryan

For many people, figuring out how to get their development database into source control is the first step to a robust, repeatable, automated (and exciting) database DevOps pipeline. This, coupled with exactly which technology (Azure DevOps, Github, GitLab, BitBucket… the list goes on) you’ll be using for Source Control (and later CI/CD) can make it quite overwhelming.

overwhelmed choices GIF

Now fortunately I’ve worked with a number of teams on setting up source control methodologies and some work better than others depending on how you want to work. Remember:

“DevOps is the union of people, process, and products to enable continuous delivery of value to our end users.”Donovan Brown

And the key there is equal parts, whilst technology has a part to play, it comes down to the teams; nurturing and feeding a positive mindset of collaboration and communication within the team and then defining which methodologies and processes work best for you.

Once you’ve got that down, pick the source control methodology that works best for you, and luckily there are 4 choices:

  • State-First
  • Migrations-First
  • Hybrid- or Optimized-Model
  • Other

Ok… maybe I lied about the 4 choices because other can encapsulate many many different options in itself. But, what I’m going to talk about below are the 3 primary options I see development teams adopt and how they fit into your teams culture.

If you’re already tired of reading then you’re in luck! I also talked about this same topic at Redgate Streamed on 28th May 2020 so if you follow that link you can “register” to watch on demand, I will tell you in your own ears! (As opposed to reading below) – I won’t tell you to just watch my session because you should DEFINITELY check out the sessions also given by Kendra, Grant, Ben and Frank which were… well:*

Lets Go Yes GIF by Music Choice
*(SOOOOOO Good!)

State-First Approach

The state-first approach is, as it would suggest: the state of each object within the database is captured by whatever tool you use, i.e. the script needed to CREATE that object, and it is written out into its own flat file (most often a .sql file) in version control. The actual structure of these files and folders can vary by technology but largely it will follow a logical structure and the bottom line will be a create script per object.

When an update is made to that object, a newer version of that same create script is generated and it is added as a newer version of that script in version control and that is the latest version of the database which we can then deploy. When using state-first we have no alters, only creates, so it will be necessary to do a comparison at a later stage to work out the difference, and by extension the update/alter script that will be needed to propagate changes to later stage environments.

The benefits of the State-First method include (but are not limited to):

  • A simple approach to get started with standardizing development practices: It’s aligned with the practices we already have in place on the application development side, where source control has been standard practice for years.
  • Easy on-boarding for teams in the ‘Shared development model’: When every developer is forced to share a single development environment it can be quite hard to ensure that developers are keeping work separate; most tools that enable you to work in this model allow you to ‘lock’ objects at the database level as you work on them, or who exactly made each change that might be committed.
  • Easier to roll-back to previous state: Rollbacks are a pain with databases, but there are times where they are necessary. Maintaining a full history of the state at any given time makes it easier for us to compare and rollback environments to a state that we know worked well.

The drawbacks though of the State-First method include (but are not limited to):

  • Not as easy to achieve small, incremental deployments: Because we’re reliant on the state of the database at each stage there is still a certain element of overhead that is attached to each deployment.
  • Upgrade script determined at a later stage: Lots of people like to know EXACTLY what changes will be deployed and HOW against target environments, but because of the above reason, we’re reliant on approving changes early on, but only truly seeing how it will be deployed later in the pipeline, which doesn’t give us the same reliability or peace of mind.
  • Not as easy to refactor complex table changes: The State-First method is “How did it look to begin with and how did it look at the end?” so it doesn’t take into account the nuanced steps that may have been involved, which can be problematic when you’re adding a NOT NULL column to a table that has existing data, so these sorts of complex changes might require additional pre- and post-deployment scripts.

Migrations-First Approach

The Migrations-First approach differs significantly from the State-First approach because, as it would suggest, it relies on migrations to identify the version of the schema across environments and they usually rely on guids, numbering conventions, checksums and others to keep track of the schema, normally within a log table of their own on the affected schema itself. The migrations often come in the form of .sql files that have been written or generated and there are lots of different types, but they can be boiled down often to the idea of Repeatable, Versioned and Undo Migrations (see here on the Flyway site for a more in depth summary of these types)

The migrations then, actually contain the changes as you would like them to go out; many believe that (after testing) the script they have written is as it should be deployed, and that is exactly what is then being run against each stage. Now naturally, you need to build ON something, if you have an existing database, so many technologies will offer some sort of baselining option, to understand what already exists and what the incremental migration scripts are deploying to.

The benefits of the Migrations-First method include (but are not limited to):

  • Enables small, frequent, incremental migrations and predictable deployments: Everything is just that tiny piece of work you did, specifically. That means that only what you need to go out will go out; only what was approved at Pull Request time. This gives us high confidence that we’re sending the right changes to Prod.
  • Ideal for environments with high up-time requirements: There’s no heavy state to check, we’re just migrating these tiny changes, which means there’s far less chance of causing huge overhead on Production at deployment time.
  • Ability to use your own custom standards and code for table changes in deployments: No script generation or the ability to edit generated scripts is one of the greatest capabilities of this model. For complex changes, the steps to achieve this we KNOW that work are included, and not only that, the scripts are commented and formatted and easy to understand with our company standard, making it easy to keep track of what has been deployed.

The drawbacks though of the Migrations-First method include (but are not limited to):

  • Not as easy to pick and choose changes to be deployed: If a developer has captured multiple changes within the same script, but we only want to deploy a subset of those changes, or we don’t want to deploy to a subset of those objects right now, then it’s really hard (almost impossible) to try and unpick these changes, this also makes testing certain changes in isolation tough!
  • Higher learning curve for teams: This method is neither as easy to adopt nor as intuitive as the State-First approach, which means developers need to get used to writing their migration scripts, ensuring they’re properly formatted, commented, tested, numbered and where necessary, the undo script for those changes. This results in a much higher ask for the team; the cost for gaining the predictability of deployments.
  • Harder to roll-back changes: _On those very same undo scripts then_ they have to be absolutely perfect. It’s still much harder to undo, especially if we’re trying to undo migration 5.0.1 when we’re already on 6.1.2, everything has to cascade neatly if you’re carrying out multiple undo’s and having a water tight undo strategy is hard to nail down.

The Hybrid / Optimized Model

This particular model is a rare one to find because it is not offered widely, but where it is achievable it can offer the benefits of both the State- and Migrations-First models.

As the name would suggest, it is a combination of the state and migrations approaches into a single Hybrid model; developers store the state of their database in source control, allowing them to easily rework their changes and commit multiple times to their working branch as they develop the “end goal”, and then from this same location once those changes are confirmed, pushed and ready to go, the relevant migrations are generated from the latest state.

Now this model can be adapted into lots of different workflows: developers can all generate their own migrations from their state and check them in together when they’re happy. This records a granular history of each change that was made and how it applies to each object, and is easy to work with, and then the migration contains just what needs to go out from all of that work. Another option would be having developers make the changes and check these into a DB State folder in source control, and then having more experienced developers or DBAs etc. generate the respective migrations from the state, knowing that they have a greater confidence in the SQL specific changes that are captured in the script. This is nice because it gives cause for another pair of eyes, which again gives greater confidence in what ultimately gets deployed.

The benefits of the Hybrid / Optimized method include (but are not limited to):

  • Full granular history around object changes on a state level, but with customization, flexibility and reliability of migration scripts: Know exactly what has changed, when and by whom, but don’t worry that you don’t know exactly what change will be deployed.
  • Separation of duties for Developers and Senior Team Leads / DBAs (who generates what / who has what specialty) and a lower learning curve for developers: Easy for developers to make changes quickly and easily without having to worry about the “nitty-gritty” and exactly what SQL will be needed. Gives DBAs and senior developers peace of mind that changes are ultimately adopted and improved by people who _know_ the database.
  • Easily extends existing state-first model where migrations are needed: State-First is a great choice 70% of the time but there ARE times where data migrations or complex changes are needed. This method includes these changes where needed, instead of relying on pre- and post-migration scripts, which run globally every single time.
  • Easier to pick and choose changes to go out: Because we can choose which changes to which objects are going out in the migration scripts, it’s easier for us to grab only the ones we want to push out each time, like an additional “cherry pick” layer within the development process.

The drawbacks though of the Hybrid / Optimized method include (but are not limited to):

  • Additional step added to the process can make it feel like red-tape / added work: In some cases teams may wish to make changes and get them out _fast_ as part of continuous deployment, and could be doing so hundreds of times per day. This model can get in the way of that because it adds an additional layer of dependency.
  • Could add some time to the overall development process for new changes: This is almost exactly the same as the above reason. More steps to include, more people to include, slightly less automation than we would like _perhaps_ so naturally time to deploy increases slightly (but arguably is offset by greater confidence in the change? I’ll let you decide!)
  • Duplicated schema model in Source Control repository: Some tools keep a copy of the schema in source control as reference for the migrations, others don’t. In either case, you’re maintaining two versions of a repository, which many say should be the single source of truth, if these two are even slightly out of sync, who are we going to believe? This model calls for discipline, as sloppiness can destroy all of the proposed benefits.

aaaaaaaaaaaaaaaaaand… breathe!

Slow Down Reaction GIF by True and the Rainbow Kingdom

Conclusion

There are lots of different models you can adopt for the source controlling of your database and changes, in this post I’ve outlined 3 (well… 2 and a half really) but whatever you’re looking to adopt, hopefully this will give you greater confidence in adopting the right one.

Have a wonderful week!

2 thoughts on “Which database source control model works best for me?

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out /  Change )

Google photo

You are commenting using your Google account. Log Out /  Change )

Twitter picture

You are commenting using your Twitter account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )

Connecting to %s