Windows Server 2008 R2 contains a rebuilt backup
program appropriately named Windows Server Backup. Windows Server Backup
can be used to back up each cluster node and any cluster disks that are
currently online on the local node. Also, the System State of the
cluster node can be backed up individually or as part of a complete
system backup.
To successfully back up and
restore the entire cluster or a single cluster node, the cluster
administrator must first understand how to troubleshoot, back up, and
restore a standalone Windows Server 2008 R2 system using Windows Server
Backup.The process of backing up cluster nodes is the same as
for a standalone server, but restoring a cluster might require
additional steps or configurations that do not apply to a standalone
server. To be prepared to recover different types of cluster failures,
you must take the following steps on each cluster node:
Back up each cluster node’s local disks.
Back up each cluster node’s System State.
Back up the cluster quorum from any node running in the cluster.
For failover clusters using shared storage, back up shared cluster disks from the node on which the disks are currently hosted.
Failover Cluster Node—Backup Best Practices
As
a backup best practice for cluster nodes, administrators should strive
to back up everything as frequently as possible. Because cluster
availability is so important, here are some recommendations for cluster
node backup:
Back up each cluster node’s System State daily and immediately before and after a cluster configuration change is made.
Back up cluster local drives and System State daily if the schedule permits or weekly if daily backups cannot be performed.
Back up cluster shared drives daily if the schedule permits or weekly if daily backups cannot be performed.
Using
Windows Server Backup, perform a full system backup before any major
changes occur and monthly if possible. If a full system backup is
scheduled using Windows Server Backup, this task is already being
performed.
Restoring an Entire Cluster to a Previous State
Changes to a cluster should
be made with caution and, if at all possible, should be tested in a
nonproduction isolated lab environment first. When cluster changes have
been implemented and deliver undesirable effects, the way to roll back
the cluster configuration to a previous state is to restore the cluster
configuration to all nodes. This process is simpler than it sounds and
is performed from only one node. There are only two caveats to this
process:
All the cluster nodes
that were members of the cluster previously need to be currently
available and operational in the cluster. For example, if Cluster1 was
made up of Server1 and Server2, both of these nodes need to be active in
the cluster before the previous cluster configuration can be rolled
back.
To
restore a previous cluster configuration to all cluster nodes, the
entire cluster needs to be taken offline long enough to restore the
backup, reboot the node from which the backup was run, and manually
start the cluster service on all remaining nodes.
To restore an entire cluster to a previous state, perform the following steps:
1. | Log
on to one of the Windows Server 2008 R2 cluster nodes with an account
with administrator privileges over all nodes in the cluster. (The node
should have a full system backup available for recovery.)
|
2. | Click Start, click All Programs, click Accessories, and select Command Prompt.
|
3. | At the command prompt, type wbadmin get versions
to reveal the list of available backups. For this example, our backup
version is named 09/16/2009-08:30 as defined by the version identifier.
|
4. | After the correct backup version is known, type the following command wbadmin Start Recovery –version: 09/16/2009-08:30 –ItemType:App –Items:Cluster (where version is the name of the backup version name), and press Enter.
|
5. | Wbadmin
returns a prompt stating that this command will perform an
authoritative restore of the cluster and restart the cluster services,
as shown in Figure 1. Type in Y and press Enter to start the authoritative cluster restore.
|
6. | When
the restore completes, each node in the cluster needs to have the
cluster service started to complete the process. This might have been
performed by the restore operation, but each node should be checked to
verify that the cluster service is indeed started.
|
7. | Open
the Failover Cluster Manager console to verify that the restore has
completed successfully. Close the console and log off of the server when
you are finished.
|
Deploying Multisite or Stretch Geographically Dispersed Failover Clusters
Geographically dispersed
failover clusters are failover clusters that include cluster nodes
deployed in multiple physical locations. The multisite or stretch term
defines whether the two locations share a common network that is
extended across the WAN, stretch, or multisite, in which cluster nodes
are members of different Active Directory sites. By definition of an
Active Directory site, these sites are defined by the different networks
they reside on. Geographically dispersed failover clusters are not easy
to deploy as each organization’s network configuration might require
different tuning parameters within the failover cluster services and
Applications group resource properties. Some special considerations for
geoclusters are as follows:
Data replication is not performed by the cluster and must be performed using a third-party hardware or software solution.
If
an even number of nodes will be deployed with an equal amount of nodes
in each location and the Node and File Share Majority Quorum
configuration is used, if the file
share is hosted in either of the sites, and that site becomes
inaccessible, the remote site will not be able to return to operation.
In this case, it might be necessary to host the file share in another
site to add some resilience to the multisite cluster.
If
the failover cluster will span multiple subnets, how will the IP
address resources be configured? You can create multiple IP address
resources in the Services and Applications group, one for each network,
but you will need to carefully define that each IP address can only run
on nodes in the group that are in the respective subnet.
For
multisite failover clusters with different IP address resources for
each network, the Network Name resource dependency will need to be
adjusted to allow for starting up when either of the IP address
resources are online, but not both. In other words, all IP address
resources should be added as dependencies of the Network Name resource
but should be listed as OR dependencies, as shown in Figure 2.
DNS
record registration settings might need to be adjusted, particularly for
Services and Applications groups that contain Network Name resources
with multiple IP addresses in different subnets. Changing the DNS record
TTL settings that the Network Name resource will use when it performs
dynamic registration can directly affect client communication after a
failover. If the client cannot resolve the network name to the correct
IP address, it does not matter if the failover cluster is online or not.
These settings can be changed using the cluster.exe utility.
Cluster
heartbeat communication settings might need to be adjusted based on the
network usage and response. This would need to be determined by
performing exhaustive
testing during different network conditions to determine if the default
heartbeat settings will be sufficient and will not unexpectedly
determine that the nodes in a site are offline due to network latency.
These settings can be changed using the cluster.exe utility.