Search
  Home
  Articles
  Backup
  Books
  Certification
  FAQ
  Products
  Replication
  Scripts
  Seminars
  Training
  TSQL

MSDN Fourms
Philippine SSUG

  Resume

MHS Enterprises
FilAm Software
AcrylicAcetate.com
Bargain Humidors
Western Humidor

5-308 Implementing and Optimizing Transactional Replication Solutions with Microsoft SQL Server 7.0 - Bren Newman

This session was a makeup from the canceled session on Saturday.  The disappointing part of this session was that very little new information was gained.   The vast majority of the content was simple transactional replication basics.   Instead of spending several pages on that here, I'll just highlight some of the more important items.  I'll be covering all of the background and details in the replication book I'm working on and in several articles posted on mssqlserver.com.

Just like merge replication, transactional replication can be managed via a set of ActiveX controls.  These can be automated via the Windows Synchronization Manager.   This is shipped with IE 4 and IE5.  This gives a centralized location for managing all of your automated processes.

Transactional replication suffered from scalability problems in previous versions.   This usually resulted from contention in the replication tables in the distribution database.  The main contention came from MSrepl_commands.  This contention is reduced by the use of a varbinary column in MSrepl_commands and also from the implementation of row level locking.  With the streamlined code in the distribution agent, support for hundreds of known subscribers is now possible which can support even the largest enterprises.  SQL Server 7.0 has an additional enhancement which can further scale replication.  The implementation of anonymous subscribers is accomplished via a pull subscription.  This places the distribution agent on the subscriber and allows scaling to thousands of anonymous subscribers.  This scalability is theoretical at this time as I haven't implemented an architecture that would approach that architecture at this time and have not heard of any such implementations.  I will be working on an architecture that has more than 700 named subscribers and one component should have several thousand anonymous subscribers, but we won't be able to tell how much load a single box can handle until we are about halfway through the initial roll out.  I'll be including some scalability stats as these implementations come together.

The biggest improvement seems to be with the distribution agent.  Previous versions of transactional replication utilized a shared distribution agent to accomplish replication.  This forced replicated transactions to bottleneck at the distributor.   SQL Server 7.0 replication is now truly multi-threaded.  Each subscriber to a transactional publication and each anonymous subscriber have a dedicated distribution agent.  This allows transactions to flow to multiple subscribers at the same time.   This has resulted in a dramatic increase in the volume of data transferred by the replication engine.

The execution options of the replication agents have been improved which allows more control over the process.  you can now use both standard and integrated security via means of a public access list (PAL).  Snapshot files can be transmitted via FTP to improve the efficiency of applied a large initial snapshot.  In previous versions, replication was confined to the SQL Server engine.  All of the replication functionality has been encapsulated in executables and ActiveX controls.  This allows you to embed replication into external applications and control the entire process programmatically.

One feature of replication that many companies have tried and very few have achieved even minimal success at is bi-directional replication.  This can also be termed "chaos everywhere".  This is a system where every publisher is a subscriber and vice versa to the same set of data.  The difficulty of this comes when everyone can modify the same data.  How do you reconcile the fact of two people modifying the same row on different systems?  Replicated systems were never designed to handle this.  Merge replication has dramatically closed this gap and added mechanisms to handle this, but it still has a way to go.  I'll go into merge replication at another time.  SQL Server 7.0 has added another option into transactional replication that does a surprisingly good job of managing this with immediate updating subscribers.   This utilizes MS DTC (Microsoft Distributed Transaction Coordinator).  This gives the ability of a subscriber to process transactions while minimizing the possibility of collisions.  This option requires a very well connected environment.  When a subscriber executes a transaction, DTC immediately connects to the publisher and attempts to process the transaction in the publication database.  If everything works, the transaction is committed at the publisher and then at the subscriber.  The normal transactional replication processes then pick up at the publisher to distribute this transaction at all other subscribers except the one who initiated the transaction.   If the transaction fails at the publisher, then it is rolled back at the subscriber as well.  This prevents conflicting transactions.  It still doesn't handle one transaction overwriting another one.  Just as with merge replication, the immediate updating subscribers option is accomplished via triggers applied to the subscription database.

In this version of SQL Server, a snapshot places a lock on the table to ensure consistency.  The next version will allow snapshots to take advantage of dynamic locks so as to not place an undue burden on the system.  You now have a number of choices and options on the replication topology to meet your needs.  Transactional replication is best utilized in an environment that is well connected and requires a very high transactional throughput.  If you do not have a well connected environment, you should investigate other forms of replication unless you require tight transactional consistency.

Michael R. Hotek

All content on this site, except where noted, represents an original work of Michael R. Hotek and is protected by applicable copyright laws. The SQL Server FAQ is the sole work of Neil Pike. No page, portion of a page, or download may be used for commercial purposes in whole or in part without the express, written permission of the applicable author.