|   | ![]() |
|
Time and again, I see posts to mailing lists or newsgroups stating that
identity columns do not work with replication. Identity columns do
in fact work with replication. There is simply some pre-planning
that must occur along with setting an option that you normally don't
use. This article contains the best practices that I've developed
over the years to get replicated systems to work together with identity
columns.
The first thing to remember is that you must manage your identity columns. You have data being inserted on more than one machine that replication brings together. If machine 1 inserts identity value 1 and machine 2 also inserts identity value 1, then you are going to have a collision. This means that you must ensure that he identity values being used on each machine are unique. In order to transfer data into an identity column, replication does an explicit insert into the identity column. This is accomplished using the set identity_insert on statement. This allows replication to insert into that identity column. What you have to be careful of, and is the sole cause of people saying replication doesn't work with identities, is the way an identity column behaves when you explicitly insert into it. When you insert into an identity column, the identity is reseeded using a value of 1 greater than the value you inserted into the identity column. These wreaks havoc on a replicated system because every time replication runs, you identities get reseeded. This manifests itself by a primary key violation. With respect to replication, you can disable the reseeding of an identity column by simply using the "not for replication" clause on every identity column. This takes the form of the following: create table mytable What the not for replication clause does is to tell SQL Server to disable the code that reseeds an identity column when a replication agent is inserting data into an identity column. This preserves all of the identity ranges that you have setup. The drawback to this option is that as of SQL Server 2000, you could not alter a table and add this option to it. There have been some suggestions of hacking the system tables that I won't go into here. Do NOT try to set this by hacking the system tables unless you want to spend your time restoring your database from you last good backup as it will destroy the database. The way to get the not for replication option on a column is to perform what is called an extended alter. This takes the form of the following:
Now that we've figured out how to prevent an identity column from reseeding when a replication agent kicks off, we are only left with one question: how do I partition my identity columns. The short answer to that question is: any way you want to. But, there are some very simple methods that I've used over the years that take a lot of the guesswork out of the equation. Most people will use an integer data type for identities. What few people ever take advantage of is the fact that an integer can be both positive and negative. People will always use the positive side and almost never the negative side. But using both ends of the value range makes setting up and expanding replication architectures using identities very easy. The scenarios below give you an overview of how I initially partition an identity range and also how to expand upon one as well. 2 machines
4 machines
8 machines
16 machines
The basic idea is to split your identity range into even/odd, then positive/negative as you add machines. Once you pass 4 machines, you simply begin taking each range and splitting it in half. You continue this splitting as you add machines. For extremely large implementations, this can become quite a challenge to manage. For those implementations that exceed 500 machines, I usually use a decimal data type as the identity simply because it provides me a much wider range of values to work with in sub-dividing the identity range. For those running SQL Server 2000, much of this time and effort has been automated for you. All you have to do is setup you initial ranges, and then replication will manage them for you. It does this by allowing you to specify a threshold value for the identities. Values will be inserted into your identity columns by replication as normal. When the identity value hits the threshold you have specified, replication will reseed the identity into the next highest value range available. This can leave gaps in your values, but you no longer have to manage all of the identity ranges. You do still need to specify the not for replication option with SQL Server 2000. |
All content on this site, except where noted, represents an original work of Michael R. Hotek and is protected by applicable copyright laws. The SQL Server FAQ is the sole work of Neil Pike. No page, portion of a page, or download may be used for commercial purposes in whole or in part without the express, written permission of the applicable author.