|   | ![]() |
|
Version: SQL Server 2005 RTM through SP2. Feature: Merge replication
SQL Server 2005 has the new capability to allow you to propagate DDL changes through the replication engine. While this is a very welcome addition, you must be extremely careful with the way that you deploy schema changes. A perfectly valid DDL script, which executes without an error on the publisher, can cause the merge replication engine to completely and irrevocably explode. Fixing this wonderful problem, sarcasm intended, means that you have to completely blow away your subscribers and start from scratch. While this might not be an issue if you have one or two subscribers, just consider having to do this with dozens, hundreds, or thousands of subscribers. You might be wondering how you manage to explode the merge replication engine with a very simple and completely valid DDL script. The steps below will give you an exact reproduction of the problem. The steps below require a single instance of SQL Server 2005 SP2 or lower and assume that replication is already enabled.
At this point, merge replication is completely broken and nothing is going to get it working again short of completely blowing away the subscriber and starting over There is someone reading this who is thinking that you could just reinitialize the subscriber. True, but consider that for a moment. This was a simple demo. Real life is not so simple. You have changes sitting on the subscriber that you are getting ready to blow away with a reinit. You might have hundreds or thousands of these subscribers sitting on the other end of very slow WAN links or you might be dealing with a rather large data set. All of these situations dictate that reinitializing all of the subscribers means you are effectively taking your business completely offline until you are done. Let us consider that you actually have the luxury of reinitializing. What about the subscribers with changes that have not uploaded? Well, there is a new option in SQL Server 2005 that allows you to upload changes first and then reinit. The undocumented fine print for this feature does NOT tell you that in order to upload changes first, the schema must match between the publisher and subscriber. Therefore, it is impossible to upload changes first and then blast the subscriber away with a reinit. The only way out of this predicament is to perform the following steps:
The steps above might look simple, but just consider doing this on a subscriber that might have hundreds of tables and millions of rows of data. Additionally, how are you going to reconcile changes that have occurred to rows on multiple subscribers. Add to that, the fact that if you are using auto-identity range management, the SET IDENITY_INSERT command is not allowed, meaning that you now have to generate a new identity value for a parent row and relink all child rows to it. As you can tell, this is an extremely bad situation. There is exactly 1 workaround that does not entail reinitializing, but it does require a significant amount of manual effort. In order to get everything running again, you have to connect to each subscriber and manually create Table2 on the subscriber. All of the DDL changes along with the incremental snapshot will then execute. Those managing small implementations might think the workaround is pretty simple and straightforward, you are the lucky ones. This situation happened at one of my customers where we have 42 active subscribers, some of which might not connect for 10 - 15 days at a time. So, you have a broken system for weeks before you can get everything repiared and that means a LOT of sleepless nights while people are screaming at you because they can't work. One of my other customers was prepping a script very similar to the one above to deploy this weekend, except they have 17,352 subscribers that would have been exploded had this happened and fixing the problem would have meant literally millions of dollars. So, the bottom line is that if you are using merge replication to deploy schema changes, you must place any table creations in their own script, add them to replication, and immediately generate a snapshot. Only then can you go back and add the columns to the child tables and create the foreign keys. Running all of this in a single script will explode your replication system. Some of you are quickly reading through this to find out the reason why this happens, after all you keep contacting me, going to conference sessions that I'm at, and attending training to find out the gory details of all of this merge metadata stuff. So, this next section is for you. For those who do not want to warp their brains, you can stop reading now. Many think that DDL changes are picked up from the transaction log by the replication engine. This is not true. DDL changes are propagated by means of a set of DDL triggers. These triggers are named: MSmerge_tr_alterschemaonly, MSmerge_tr_altertable, MSmerge_tr_altertrigger, and MSmerge_tr_alterview. Each database participating in merge replication will have these 4 DDL triggers. The purpose of the DDL triggers is to pick up any ALTER commands and place them in dbo.sysmergeschemachange. Every time you change the schema for an object that is participating in merge replication, a new schema version will be generated. A new schema version will be generated for EVERY change that is made. For example, say you want to add 6 columns to a table that is replicated, You can do this either by creating an ALTER TABLE command that adds all 6 columns. You can also do this by creating 6 separate ALTER TABLE commands that add 1 column each. If you wrap all 6 columns into a single ALTER TABLE command, the schema version will increment by 1. If you execute 6 separate ALTER TABLE commands, the schema version will increment by 6 - one schema version for each ALTER TABLE command. Additionally, every time you add a new article to a publication, the schema version is incremented. Think about this very carefully before going on and you will start to see the problem. Where this behavior creates a collision with merge replication is in the interaction of the merge DDL triggers and the sysmergeschemachange table. When we execute our demo script which explodes merge replication, the following actions occur underneath the covers:
You can verify the actions for the above steps by selecting from sysmergeschemachange. The ALTER TABLE commands will have a schematype of 300. The incremental snapshot will have a schematype of 2. You will find that the artid (article ID - found in sysmergearticles) corresponds to the table being altered. The artid for the snapshot will correspond to the newly created table, the parent. The problem is that the replication engine is sequential. It has to iterate through each schema version and apply it in order without having any intelligence whatsoever. All of the data exists in the table for the replication engine to be able to figure out that it is trying to create a foreign key to a table which is created within an incremental snapshot and therefore to add the parent table first before it attempts to create the foreign key. The sad part about this major oversight, is that the merge engine doesn't even have to create the foreign key in this case. If you were to manually stuff Table2 into the test2 database, the merge engine would happily execute down through the sysmergeschemachange table. It would perform the following actions:
Step 3 does not work properly in this case, because of the incremental snapshot. Since only Table2 is sent down, the application of the snapshot will explode because it cannot drop a table with a foreign key constraint. So, you have to manually shove Table2 into the test2 database. Then synch, which will result in a failure on the incremental snapshot. You then have to manually drop the foreign key on the subscriber. Run the synch again to get through the incremental snapshot. Then drop and recreate the foreign key on the publisher in order to get it sent down. So, you now have one more restriction on applying DDL changes with replication that is not documented by Microsoft at this time. It is illegal to execute a single script that creates a foreign key to a table already participating in replication if the parent table has not already been replicated to the subscribers. In order to make DDL changes propagate properly, you have to manipulate your DDL scripts so they work with the DDL triggers and are cognizant of the way the schema version is incremented. Therefore, all scripts that you create must ensure that the parent table has been added to the publication and a new snapshot generated which creates the entry in sysmergeschemachange for the parent table. Once the entry for the parent table is in sysmergeschemachange, you can then perform the remainder of your actions necessary to add the foreign key. One of the claims for replication in SQL Server 2005 was, "it just works". Well, after opening a half dozen support cases on the merge engine blowing up due to the state of the metadata (you will see several more of these articles posted over the next several days), I would amend this as follows: "it just works, provided you understand the precise order in which metadata entries are made into the tables and you ensure that everything you do obeys all of the restrictions, just because a script is legal to execute does not mean that it won't completely blow up the replication engine". |
All content on this site, except where noted, represents an original work of Michael R. Hotek and is protected by applicable copyright laws. The SQL Server FAQ is the sole work of Neil Pike. No page, portion of a page, or download may be used for commercial purposes in whole or in part without the express, written permission of the applicable author.