|   | ![]() |
|
Achieving High Availability - Part 1 1 2 3 In a nutshellHigh availability has a broad spectrum of definitions that depend upon your environment and business needs. It all boils down to money. Anything can be accomplished given enough money. But, is it really necessary? The biggest advantage you can give yourself is in the initial selection of your hardware. Don't skimp on this. Hardware is relatively cheap nowadays. There is no reason to cut corners to save a few dollars. That few hundred dollars saved between hot pluggable drives and not can mean the difference between a server that runs forever and one that goes down in the first week. It never ceases to amaze me that a company has no problems approving the purchase of a $50,000 server, but they will pinch a couple of dollars on the NIC or some other such component. Buy equipment from the same manufacturer. It is much easier to manage an environment and keep it up when you have 10 Compaq servers instead of a couple of Compaqs, 3 Dells, 2 HPs, and 4 IBMs. The components don't intermix for the most part so you have to keep extra components for several different brands instead of a small number of spares that can be used across all servers. Everyone has their favorites for hardware. All of the major server vendors have very solid hardware. I am partial to Compaq hardware simply because of my years of experience with it. That is not to say the others are inferior. I spec Compaq hardware simply because they have proven reliable and never given me any reason to doubt them. I'll be providing product reviews for the various Compaq servers I've used along with some additional comments in my section on swynk.com. The bottom line is to choose a very reliable vendor and stick with them unless they give you a reason to look elsewhere. Use hardware RAID. Some people still want to cut corners and do RAID via software or not at all. It is not worth the effort. Not using RAID on a server is just plain dangerous because a single disk failure takes down the whole system which defeats the purpose of availability. RAID controllers have gotten very good. In many instances, they far exceed the capabilities of software RAID. Go for the multi-channel, high-end cards. You will thank yourself down the road. The multiple channels can dramatically increase the throughput or storage capacity. They also include a dedicated processor, cache, and battery backup. This allows the OS to simply throw everything at the controller. The controller then performs all of the logic necessary to write to the array. The cache allows the controller to read ahead which can increase throughput. The battery backup ensures data is written to the disk even in the event of a server crash. Get redundant, hot pluggable fans and power supplies. Your server should have more than one fan and power supply. One fan should be sufficient to cool the machine. A single power supply should be sufficient to power the server at maximum load. This allows you to lose a power supply and fan and still continue to run. Hot pluggable gives the added advantage that you can replace a fan or power supply without taking the server offline. Some manufacturers boast about having more than one power cord going to the power supply to protect you from the cord failing. Unless you run a lawnmower through your server room or walk around and cut power cords while equipment is running this is a pretty worthless feature. Buy high quality, hot pluggable disk drives. Don't buy Joe's el cheapo drive special. The most common component to fail is a disk drive simply because they take a beating with all of the read/write activity. The better quality drive you get, the more you minimize this. Getting high quality drives does no good if they are not hot pluggable. If you don't have hot pluggable drives and there is a failure, you have to shut down the server to replace the drive. The price difference isn't great enough between hot pluggable and non hot plug to warrant this. Use RAID! Use RAID! Use RAID! Did I say use RAID!? This is the single biggest choice you can make that will determine if the server will stay up and running or if you have to take it offline. The best part about this? You are already buying sufficient drive space for what you need. Spend the extra money on the 1 or two drives you will need to implement RAID. You can then suffer a drive failure without the system going offline. Use quality DLT drives. The price difference between DLT and DAT isn't that great. DLTs operate at a much higher speed and have a much greater capacity. Don't get the el cheapo kind. The difference between meeting or failing your high availability objectives could be your tape drive. You might be able to suffer a 1 hour outage. Is the DAT going to be able to restore 8 GB - 10 GB of data in that hour and give you time to verify it? UPS anyone? Get a UPS to handle your servers. You can spend all of the money you want and have a power spike blow it all away. A UPS protects you form that. It also allows the server to have the time to finish processing and then gracefully shut down. Nothing is harder on a server than a sudden power loss and power up. A UPS is your insurance policy. Speaking of UPSes, where are you going to put them? You should have a server room to put everything. This should be accessible only to authorized personnel. The room should be climate controlled with sufficient cooling to handle the heat the servers will put out. Running a server at very high temperatures is very hard on the components. In addition to that, many of the newer servers have thermal sensors and will shutdown the server when the thermal threshold is crossed. The last thing you ant to have to explain to the CEO of Amazon.com is that the company lost $50,000 in orders, because you decided to save a few dollars in electricity and turned off the air conditioner. The power to the server room should be on a separate segment than the rest of the building and should be clean and conditioned to minimize any spikes. You also have to make sure you have sufficient power and it is segmented properly in the server room. It does no good to plug a bunch of servers pulling 100 Amps into a circuit that can only handle 75 Amps. The power also needs to be shielded to reduce electromagnetic interference. I highly recommend rack mounting all of your servers. This gets all of the equipment off the floor, hides all of the wiring, and keeps the server room well organized. The power should feed through the floor and all wiring should run beneath the floor panels. This prevents having wires and cords running around for people to trip over. Last of all, you want to consider the fire suppression unit. You might laugh, but I actually saw a sprinkler system in a server room once. Everything you do is a moot point if you dump a few gallons of water over all of those electrical components. If you have quality hardware with RAID sitting in a clean, cool server room protected by a UPS, you will meet your availability requirements in just about all cases. For those who need to get a part of that other .1% of the time, there are other solutions that will be outlined in subsequent articles. Click here to continue to part 2. |
All content on this site, except where noted, represents an original work of Michael R. Hotek and is protected by applicable copyright laws. The SQL Server FAQ is the sole work of Neil Pike. No page, portion of a page, or download may be used for commercial purposes in whole or in part without the express, written permission of the applicable author.