Most (maybe all) projects that I built in my professional job have required a relational database to persist data. These projects have had long lifespans. Therefore, upgrading and adding new features to software projects typically requires changes to the database schema. What are some techniques that can help ensure the database upgrades are as smooth as the software upgrades? This article will discuss a strategy that I have used successfully for a couple years.
A key assumption is that the database is .owned. by a single project. In other words, other projects or software artifacts are not directly dependent on the database.
Script Directory Layout
Database DDL (data-definition language) scripts should be generated and saved along-side the project source code. I typically create a folder structure as follows:
The etc/sql/schema/ directory will initially contain the following scripts:
The create-head.sql script contains the statments for creating the user/schema and any grants required.
The create-schema.sql contains the initial create tables, indexes and key scripts. This script will represent .version 1. of the database schema.
The drop.sql script will contain the drop user/schema statements. This script will be used to refresh the local database schema during development.
The data directory will contain sql scripts for inserts. These scripts can include required data for deployment or mock data for local development purposes.
The objects directory will contain sql scripts for store procedures, triggers, database comments. These scripts will typically be executed during each deployment.
The patches directory will contain versioned patches to apply incremental changes and updates to the schema. This is a key directory for migration. The scripts in this directory need to follow a strict naming convention. The convention can be determined by the team. A typical conventions is as follows:
Where .0001. indicates the next patchset. More on this later.
Schema Change Log
Another key piece to data migration is the Schema Change Log table. This is a table inside the schema that tracks what patchsets have been applied to the schema. This table should minimally include a column to track the schema version. Other fields that might be interesting to capture include:
- Script Name – name of actual patchset script
- Script MD5 – the MD5 hash of the patchset script
- Date Applied – the date the patchset was applied
- Build Number – the version of the source project that this was deployed
This table will be in each environment (integration, test, staging, production, etc). The build script will read this table to determine what patchsets are required for any given deploy. Any patchset script in the patches directory that has not been applied to the environment will be included in the deploy.
Whether automatically deploying the distribution or generating deploy artifacts, the build process will need to determine what patchsets are required. The following describes the high-level steps required:
- Select max version from the Schema Change Log table
- Loop through the files in the patches directory; sorted by version
- If max version < patchset version, then include the patchset in the deploy
Naturally, there are already several projects that aim to help with database migration. Two that I know of are dbdeploy (dbdeploy.com) and LiquiBase (www.liquibase.org). However, I wrote a simple groovy script to accomplish the process. I found that to be an easy solution. It allowed me to customize the process to fit my company’s deployment requirements.