http://www.brunton-spall.co.uk/post/2014/05/06/database-migrations-done-right/
once you do it, you’ll make releases of software much easier (simpler deployments = shorter cycle times = faster throughput generally), and you’ll find that thinking about the tasks in this way is actually much easier.
https://stripe.com/blog/online-migrations
The rule is simple. You should never tie database migrations to application deploys or vice versa. By minimising dependencies you enable faster, easier and cleaner deployments
These patterns often break the change into multiple database and application deployments. For example, the pattern of adding a non-nullable column to a database schema could require:
- schema change to add a nullable column
- update the software to write to the nullable column and handle nulls on read
- perform data migration to update the null columns to have the correct data
- execute a schema change to set the column to not-nullable
- remove the null-handling code from the app
If you can execute schema changes or deploy code around once a week or fortnight, then executing that process could take you two months. If you can make these changes hours or minutes apart, then this is a couple of days work for a developer at most.
The application of these patterns requires understanding that you need to make very small changes, each released to live as fast as possible and with as quick feedback as you can get.
Every change you make must be backward compatible with the rest of the system
So looking back at our add non-null column change we can identify it like this:
- Add nullable column to database – System keeps adding rows, nulls are fine, reads ignore the null
- Code change to write correct value to new rows, and handle reading unexpected nulls – Database doesn’t change, now we have some null rows and some rows with data
- Run data migration to fill the other columns – This might be a script, or a bit of code in the application, either way your app doesn’t care about any row, it handles data and nulls just fine
- Add the non-null constraint – The database now has no nulls and your new code is writing the correct data.
- Remove the code that handles the null case – it won’t happen anymore.
https://stripe.com/blog/online-migrations
Engineering teams face a common challenge when building software: they eventually need to redesign the data models they use to support clean abstractions and more complex features. In production environments, this might mean migrating millions of active objects and refactoring thousands of lines of code.
- Dual writing to the existing and new tables to keep them in sync.
- Changing all read paths in our codebase to read from the new table.
- Changing all write paths in our codebase to only write to the new table.
- Removing old data that relies on the outdated data model.
Stripe Subscriptions helps users like DigitalOcean and Squarespace build and manage recurring billing for their customers. Over the past few years, we’ve steadily added features to support their more complex billing models, such as multiple subscriptions, trials, coupons, and invoices.
In the early days, each Customer object had, at most, one subscription. Our customers were stored as individual records. Since the mapping of customers to subscriptions was straightforward, subscriptions were stored alongside customers.
class Customer
Subscription subscription
end
Eventually, we realized that some users wanted to create customers with multiple subscriptions. We decided to transform the
subscription
field (for a single subscription) to a subscriptions
field—allowing us to store an array of multiple active subscriptions.class Customer
array: Subscription subscriptions
end
As we added new features, this data model became problematic. Any changes to a customer’s subscriptions meant updating the entire Customer record, and subscriptions-related queries scanning through customer objects. So we decided to store active subscriptions separately.
In our case, we record all newly-created subscriptions into both the Customers table and the Subscriptions table. Before we begin dual writing to both tables, it’s worth considering the potential performance impact of this additional write on our production database. We can mitigate performance concerns by slowly ramping up the percentage of objects that get duplicated, while keeping a careful eye on operational metrics.
At this point, newly created objects exist in both tables, while older objects are only found in the old table. We’ll start copying over existing subscriptions in a lazy fashion: whenever objects are updated, they will automatically be copied over to the new table. This approach lets us begin to incrementally transfer our existing subscriptions.
The most expensive part of backfilling the new table on the live database is simply finding all the objects that need migration. Finding all the objects by querying the database would require many queries to the production database, which would take a lot of time. Luckily, we were able to offload this to an offline process that had no impact on our production databases. We make snapshots of our databases available to our Hadoop cluster, which lets us use MapReduce to quickly process our data in a offline, distributed fashion.
We need to be sure that it’s safe to read from the new Subscriptions table: our subscription data needs to be consistent. We’ll use GitHub’s Scientist to help us verify our read paths. Scientist is a Ruby library that allows you to run experiments and compare the results of two different code paths, alerting you if two expressions ever yield different results in production. With Scientist, we can generate alerts and metrics for differing results in real time. When an experimental code path generates an error, the rest of our application won’t be affected.
We now want to reverse the order: write data to the new store and then archive it in the old store. By keeping these two stores consistent with each other, we can make incremental updates and observe each change carefully.
Refactoring all code paths where we mutate subscriptions is arguably the most challenging part of the migration. Stripe’s logic for handling subscriptions operations (e.g. updates, prorations, renewals) spans thousands of lines of code across multiple services.
- dataVersion field