Disk Arrays
The most common hardware
malfunction is probably a hard disk failure. Even though hard disks have
become more reliable over time, they are still subject to failure,
especially during their first month or so of use. They are also
vulnerable to both catastrophic and degenerative failures caused by
power problems. Fortunately, disk arrays have become the norm for most
servers, and good fault-tolerant RAID systems are available in Windows
Server 2003 and RAID-specific hardware supported by Windows Server 2003.
The choice of software or hardware RAID, and the particulars of how you
configure your RAID system, can significantly affect the cost of your
servers. To make an informed choice for your environment and needs, you
must understand the trade-offs and the differences in fault tolerance,
speed, configurability, and so on.
Hardware vs. Software
RAID can be
implemented at the hardware level, using RAID controllers, or at the
software level, either by the operating system or by a third-party
add-on. Windows Server 2003 supports both hardware RAID and its own
software RAID.
Hardware RAID
implementations require specialized controllers and cost significantly
more than an equal level of software RAID. However, for that extra
price, you get a faster, more flexible, and more fault-tolerant RAID.
When compared to the software RAID provided in Windows Server 2003, a
good hardware RAID controller supports more levels of RAID,
on-the-fly reconfiguration of the arrays, hot-swap and hot-spare drives, and dedicated caching of both reads
and writes.
Implementing
software RAID in Windows Server 2003 requires that you first convert
your disks to dynamic disks. That means your disks will no longer be
locally available to other operating systems, although this really
shouldn’t be a problem in a production environment since dual-boot is
rarely used there. However, you should consider carefully whether you
want to convert your boot disk to a dynamic disk. Dynamic disks can be
more difficult to access if a problem occurs, and the Windows Server
2003 setup and installation program provides only limited support. For
maximum fault tolerance, we recommend using hardware mirroring (RAID 1)
on your boot drive; if you do use software mirroring, make sure that you
create the required fault-tolerant boot floppy disk and test it
thoroughly before you need it.
RAID Levels for Fault Tolerance
Except for
level 0, RAID is a mechanism for storing sufficient information on a
group of hard disks such that even if one hard disk in the group fails,
no information is lost. Some RAID arrangements go even further,
providing protection in the event of multiple hard disk failures. The
more common levels of RAID and their appropriateness in a fault-tolerant
environment are shown in Table 1.
Table 1. RAID levels and their fault tolerance
Level | Number of Disks | Speed | Fault Tolerance | Description |
---|
0 | N | +++ | - - - | Striping
alone. Not fault-tolerant—it actually increases your risk of
failure—but does provide for the fastest read and write performance. |
1 | 2N | + | ++ | Mirror
or duplex. Slightly faster read than single disk, but no gain during
write operations. Failure of any single disk causes no loss in data and
minimal performance hit. |
3 | N+1 | ++ | + | Byte-level
parity. Data is striped across multiple drives at the byte level with
the parity information written to a single dedicated drive. Reads are
much faster than with a single disk, but writes operate slightly slower
than a single disk because parity information must be generated and
written to a single disk. Failure of any single disk causes no loss of
data but can cause a significant loss of performance. |
4 | N+1 | ++ | + | Block-level parity with a dedicated parity disk. Similar to RAID-3 except that data is striped at the block level. |
5 | N+1 | + | ++ | Interleaved
block-level parity. Parity information is distributed across all
drives. Reads are much faster than a single disk but writes are
significantly slower. Failure of any single disk provides no loss of
data but results in a major reduction in performance. |
0+1 and 10 | 2N | +++ | ++ | Striped
mirrored disks or mirrored striped disks. Data is striped across
multiple mirrored disks or multiple striped disks are mirrored. Failure
of any one disk causes no data loss and no speed loss. Failure of a
second disk could result in data loss. Faster than a single disk for
both reads and writes. |
Other | Varies | +++ | +++ | Array
of RAID arrays. Different hardware vendors have different proprietary
names for this RAID concept. Excellent read and write performance.
Failure of any one disk results in no loss of performance and continued
redundancy. |
When choosing the RAID level to use for a given application or server, consider the following factors:
Intended use
Will this application be primarily read intensive, such as file
serving, or will it be predominantly write intensive, such as a
transactional database?
Fault tolerance How critical is this data, and how much can you afford to lose?
Availability
Does this server or application need to be available at all times, or
can you afford to be able to reboot it or otherwise take it offline for
brief periods?
Performance
Is this application or server heavily used, with large amounts of data
being transferred to and from it, or is this server or application less
I/O intensive?
Cost
Are you on a tight budget for this server or application, or is the
cost of data loss or unavailability the primary driving factor?
You need to evaluate
each of these factors when you decide which type of RAID to use for a
server or portion of a server. No single answer fits all cases, but the
final answer requires you to carefully weigh each of these factors and
balance them against your situation and your needs. The following sections take a closer look at each factor and how it weighs in the overall decision-making process.
Intended Use
The intended use, and the
kind of disk access associated with that use, plays an important role
in determining the best RAID level for your application. Think about how
write intensive the application is and whether the manner in which the
application uses the data is more sequential or random. Is your
application a three-square-meals-a-day kind of application, with
relatively large chunks of data being read or written at the same time,
or is it more of a grazer or nibbler, reading and writing little bits of
data from all sorts of different places?
If your
application is relatively write intensive, you’ll want to avoid software
RAID if possible and avoid RAID-5 if other considerations don’t force
you to use it. With RAID-5, any application that requires more than 50
percent writes to reads is likely to be at least somewhat slower, if not
much slower, than it would be on a single disk. You can mitigate this
to some extent by using more but smaller drives in your array and by
using a hardware controller with a large cache to offload the parity
processing as much as possible. RAID-1, in either a mirror or duplex
configuration, provides a high degree of fault tolerance with no
significant penalty during write operations—a good choice for the
Windows Server 2003 system disk.
Note
Mirroring won’t
protect you from data corruption caused by a catastrophic power
interruption to a write cached system disk. Disabling write caching on
boot and system volumes can is highly recommended if your system isn’t
protected by a UPS. And no UPS can protect you from tripping over the
power cord. A good, battery-backed cache, however, will protect you even
then.
If your application is
primarily read intensive, and the data is stored and referenced
sequentially, RAID-3 or RAID-4 might be a good choice. Because the data
is striped across many drives, you have parallel access to it, improving
your throughput. And because the parity information is stored on a
single drive rather than dispersed across the array, sequential read
operations don’t have to skip over the parity information and are
therefore faster. However, write operations are substantially slower,
and the single parity drive can become an I/O bottleneck during write
operations.
If your application
is primarily read-intensive and not necessarily sequential, RAID-5 is an
obvious choice. It provides a good balance of speed and fault
tolerance, and the cost is substantially lower than the cost of RAID-1.
Disk accesses are evenly distributed across multiple drives, and no one
drive has the potential to be an I/O bottleneck. However, writes require
calculation of the parity information and the extra write of that
parity, slowing write operations down significantly.
If your
application provides other mechanisms for data recovery or uses large
amounts of temporary storage, which doesn’t require fault tolerance, a
simple RAID-0, with no fault tolerance but fast reads and writes, is a
possibility.
Fault Tolerance
Carefully
examine the fault tolerance of each of the possible RAID choices for
your intended use. All RAID levels except RAID-0 provide some degree of
fault tolerance, but the effect of a failure and the ability to recover
from subsequent failures can be different.
If a drive in a RAID-1
mirror or duplex array fails, a full, complete, exact copy of the data
remains. Access to your data or application is unimpeded, and
performance degradation is minimal, although you do lose the benefit
gained on read operations of being able to read from either disk. Until
the failed disk is replaced, however, you have no fault tolerance on the
remaining disk. Once you replace the failed disk, overall performance
is significantly reduced while the new disk is initialized and the
mirror is rebuilt.
In a RAID-3 or RAID-4
array, if one of the data disks fails, a significant performance
degradation occurs because the missing data needs to be reconstructed
from the parity information. Also, you’ll have no fault tolerance until
the failed disk is replaced. If it is the parity disk that fails, you’ll
have no fault tolerance until it is replaced, but also no performance
degradation. Once you replace the failed disk, overall performance is
significantly reduced while the new disk is initialized and the parity
information or data is rebuilt.
In a RAID-5 array, the
loss of any disk results in a significant performance degradation, and
your fault tolerance will be gone until you replace the failed disk.
Once you replace the disk, you won’t return to fault tolerance until the
entire array has a chance to rebuild itself, and performance is
seriously degraded during the rebuild process.
RAID systems that are
arrays of arrays can provide for multiple failure tolerance. These
arrays provide for multiple levels of redundancy and are appropriate for
mission-critical applications that must be able to withstand the
failure of more than one drive in an array.
Availability
All
levels of RAID, except RAID-0, provide higher availability than a
single drive. However, if availability is expanded to also include the
overall performance level during failure mode, some RAID levels provide
definite advantages over others. Specifically, RAID-1,
mirroring/duplexing, provides enhanced availability when compared to
RAID levels 3, 4, and 5 during failure mode. There is minimal
performance degradation when compared to a single disk if one half of a
mirror fails, whereas a RAID-5 array has substantially compromised
performance until the failed disk is replaced and the array is rebuilt.
In addition, RAID
systems that are based on an array of arrays can provide higher
availability than RAID levels 1 through 5. Running on multiple
controllers, these arrays are able to tolerate the failure of more than
one disk and the failure of one of the controllers, providing protection
against the single point of failure inherent in any single-controller
arrangement. RAID-1 that uses duplexed disks running on different
controllers—as opposed to RAID-1 that uses mirroring on the same
controller—also provides this additional protection and improved
availability.
Hot-swap drives
and hot-spare drives can further
improve availability in critical environments; this is especially true
for hot-spare drives. By providing for automatic failover and
rebuilding, they can reduce your exposure to catastrophic failure and
provide for maximum availability.
Performance
The relative
performance of each RAID level depends on the intended use. The best
compromise for many situations is arguably RAID-5, but you should be
suspicious of that compromise if your application is fairly write
intensive. Especially for relational database data and index files where
the database is moderately or highly write intensive, the performance
hit of using RAID-5 can be substantial. A better alternative is to use
RAID-0+1 or RAID10.
Whatever level of RAID
you choose for your particular application, it will benefit from using
more small disks rather than a few large disks. The more drives
contributing to the stripe of the array, the greater the benefit of
parallel reading and writing you’ll be able to realize—and your array’s
overall speed will improve.
Cost
The delta in cost
between RAID configurations is primarily the cost of drives, potentially
including the cost of additional array enclosures because more drives
are required for a particular level of RAID. RAID-1, either duplexing or
mirroring, is the most expensive of the conventional RAID levels,
because it requires at least 33 percent more raw disk space for a given
amount of net storage space than other RAID levels.
Another
consideration is that RAID levels that include mirroring or duplexing
must use drives in pairs. Therefore, it’s more difficult (and more
expensive) to add on to an array if you need additional space on the
array. A net 36-GB RAID-0+1 array, comprising four 18-GB drives,
requires four more 18-GB drives to double in size, a somewhat daunting
prospect if your array cabinet has bays for only six drives, for
example. A net 36-GB RAID-5 array of three 18-GB drives, however, can be
doubled in size simply by adding two more 18-GB drives, for a total of
five drives.
Hot-Swap and Hot-Spare Disk Systems
Hardware RAID systems
can provide for both hot-swap and hot-spare capabilities. A hot-swap
disk system allows failed hard disks to be removed and a replacement
disk inserted into the array without powering down the system or
rebooting the server. When the new disk is inserted, it is automatically
recognized and either will be automatically configured into the array
or can be manually configured into it. Additionally, many hot-swap RAID
systems allow you to add hard disks into empty slots dynamically,
automatically or manually increasing the size of the RAID volume on the
fly without a reboot.
A hot-spare
RAID configuration uses an additional, preconfigured disk or disks to
automatically replace a failed disk. These systems usually don’t support
hot-swapped hard disks so that the failed disk can’t be removed until
the system can be powered down, but full fault tolerance is maintained
by having the hot spare available.
Distributed File System
The Distributed
File System (DFS) is primarily a method of simplifying the view that
users have of the available storage on a network—but it is also, when
configured appropriately, a highly fault-tolerant storage mechanism. By
configuring your DFS root on a Windows Server 2003 domain controller,
you can create a fault-tolerant, replicated, distributed file system
that gives you great flexibility while presenting your user community
with a cohesive and easy-to-navigate network file system.
When you create a
fault-tolerant DFS root on a domain controller and replicate it and the
links below it across multiple servers, you create a highly
fault-tolerant file system that has the added benefit of distributing
the load evenly across the replicated shares, giving you a substantial
scalability improvement as well.
Clustering
Windows
Server 2003 supports two different kinds of high availability
clustering, either of which can greatly improve your fault tolerance:
For many
TCP/IP-based applications, the Network Load Balancing service provides a
simple, “shared nothing,” fault-tolerant application server.
Server
clusters provide a highly available fault-tolerant environment that can
run applications, provide network services, and distribute loads.
Server clusters are available only with Windows Server 2003, Enterprise
Edition, and Windows Server 2003, Datacenter Edition.
Network Load Balancing
The Network Load
Balancing service allows TCP/IP-based applications to be spread
dynamically across up to 32 servers. If a particular server fails, the
load and connections to that server are dynamically balanced to the
remaining servers, providing a highly fault-tolerant environment without
the need for specialized, shared hardware. Individual servers within
the cluster can have different hardware and capabilities, and the
overall job of load balancing and failover happens automatically, with
each server in the cluster running its own copy of Wlbs.exe, the Network
Load Balancing control program.
Server Clusters
Server clusters generally
use a shared resource between nodes of the cluster. This resource is
generally a shared SCSI or Fibre Channel–attached disk array. Each
server in the cluster is connected to the shared resource, and the
common database that manages the clustering is stored on this shared
disk resource. Nodes in the cluster generally have identical hardware
and identical capabilities, although it is technically possible to
create a server cluster with dissimilar nodes. Windows Server 2003
supports up to eight node server clusters.
Server clusters
provide a highly fault-tolerant and configurable environment for
mission-critical services and applications. Applications don’t need to
be specially written to be able to take advantage of the fault tolerance
of a server cluster, although if the application is written to be
clustering aware, it can take advantage of additional controls and
features in a failover and fallback scenario.