|   | ![]() |
|
Deploying merge replication
is generally a simple process. You simply subscribe to a merge
publication, allow the snapshot to transfer the schema and data, and you
are up and running. What happens though when you have disconnected
users, remote subscribers over slow links, unstable connections, or simply
a massive amount of data to send in the initial snapshot.
SQL Server 2000 has added some features to make deploying large publications more easily than you could in 7.0, but let's face it, very few people are deploying SQL Server 2000 and so can't take advantage of those enhancements. There is a very simple way to deploy merge replication in the scenarios listed above that takes advantage of the way merge replication works and how it tracks data. To understand how this works, we will take a short step back into some of the principles of the internal workings of merge replication. When you setup merge replication, your publisher will have a rowguid added to each table that is published. This will uniquely track each row in that table. A set of metadata tables are also created and populated. These tables have prefixes of MSmerge_... and sysmerge... In the sysmergearticles table, you have a list of all objects that are published and each is given an ID for tracking within the merge metadata. You then get entries in MSmerge_contents that give a history of every change to every row in every table that is published. The MSmerge_genhistory table will contain one row for every row in every table that is published as well and serves has the "control file" to tell the merge agent which generations of data have been sent and which have been recieved. The subscriber will contain the same information. It's sysmergearticles and MSmerge_contents will contain similar information. The MSmerge_genhistory table on the subscriber acts as the "control file" for the subscriber. With this information in hand, at any instant in time, both the publisher and subscriber can tell you which rows of data have changed and not been sent to the other side. You also have to remember that merge is entirely self contained. This means that all information that pertains to the subscriber is contained in the subscriber database. The same holds true for the publisher. This means that in theory I can take a backup from a subscriber to another subscriber and it will have all of the information it needs to start with. So, let's look at the process. I take a backup of an existing subscriber and restore it to another machine. You have continued to enter data at both publisher and subscriber after the backup started. None of this data is contained in the backup and so the new machine has no record of it. Is that a problem? No. Where are you at this point. The publisher is up to date and has a full copy of data from the subscriber. It got this full copy of data by comparing the information in MSmerge_genhistory and MSmerge_replinfo between publisher and subscriber to determine which rows of data in which tables to replicate. The original subscriber also has a full copy of the data through this same process. The new machine that you restored the backup to, is out of synch in terms of data, but the MSmerge_genhistory and MSmerge_replinfo tables, when compared to the publisher, can give enough information to the merge agent to bring it up to date. That means we have all of the information we need. The final step of the process is to create a subscription using the nonsynch option. The nosynch option is VERY important, because you don't want a snapshot sent to that machine. You will get entries in the metadata tables that tell the publisher there is a new subscriber and that it has all of the data already. The last step the merge agent will perform is a -Validate 2. This compares the publisher and the new subscriber to ensure the data is in synch. Since in our case they aren't in synch, the merge agent will download all of the changes to the new subscriber and then begin replicating as normal. The steps to follow are:
|
All content on this site, except where noted, represents an original work of Michael R. Hotek and is protected by applicable copyright laws. The SQL Server FAQ is the sole work of Neil Pike. No page, portion of a page, or download may be used for commercial purposes in whole or in part without the express, written permission of the applicable author.