Windows Server 2008 R2 : Backing Up and Restoring Failover Clusters

4/14/2011 6:26:02 PM

Windows Server 2008 R2 contains a rebuilt backup program appropriately named Windows Server Backup. Windows Server Backup can be used to back up each cluster node and any cluster disks that are currently online on the local node. Also, the System State of the cluster node can be backed up individually or as part of a complete system backup.

To successfully back up and restore the entire cluster or a single cluster node, the cluster administrator must first understand how to troubleshoot, back up, and restore a standalone Windows Server 2008 R2 system using Windows Server Backup.The process of backing up cluster nodes is the same as for a standalone server, but restoring a cluster might require additional steps or configurations that do not apply to a standalone server. To be prepared to recover different types of cluster failures, you must take the following steps on each cluster node:

Back up each cluster node’s local disks.
Back up each cluster node’s System State.
Back up the cluster quorum from any node running in the cluster.
For failover clusters using shared storage, back up shared cluster disks from the node on which the disks are currently hosted.

Failover Cluster Node—Backup Best Practices

As a backup best practice for cluster nodes, administrators should strive to back up everything as frequently as possible. Because cluster availability is so important, here are some recommendations for cluster node backup:

Back up each cluster node’s System State daily and immediately before and after a cluster configuration change is made.
Back up cluster local drives and System State daily if the schedule permits or weekly if daily backups cannot be performed.
Back up cluster shared drives daily if the schedule permits or weekly if daily backups cannot be performed.
Using Windows Server Backup, perform a full system backup before any major changes occur and monthly if possible. If a full system backup is scheduled using Windows Server Backup, this task is already being performed.

Restoring an Entire Cluster to a Previous State

Changes to a cluster should be made with caution and, if at all possible, should be tested in a nonproduction isolated lab environment first. When cluster changes have been implemented and deliver undesirable effects, the way to roll back the cluster configuration to a previous state is to restore the cluster configuration to all nodes. This process is simpler than it sounds and is performed from only one node. There are only two caveats to this process:

All the cluster nodes that were members of the cluster previously need to be currently available and operational in the cluster. For example, if Cluster1 was made up of Server1 and Server2, both of these nodes need to be active in the cluster before the previous cluster configuration can be rolled back.
To restore a previous cluster configuration to all cluster nodes, the entire cluster needs to be taken offline long enough to restore the backup, reboot the node from which the backup was run, and manually start the cluster service on all remaining nodes.

To restore an entire cluster to a previous state, perform the following steps:

1.	Log on to one of the Windows Server 2008 R2 cluster nodes with an account with administrator privileges over all nodes in the cluster. (The node should have a full system backup available for recovery.)
2.	Click Start, click All Programs, click Accessories, and select Command Prompt.
3.	At the command prompt, type wbadmin get versions to reveal the list of available backups. For this example, our backup version is named 09/16/2009-08:30 as defined by the version identifier.
4.	After the correct backup version is known, type the following command wbadmin Start Recovery –version: 09/16/2009-08:30 –ItemType:App –Items:Cluster (where version is the name of the backup version name), and press Enter.
5.	Wbadmin returns a prompt stating that this command will perform an authoritative restore of the cluster and restart the cluster services, as shown in Figure 1. Type in Y and press Enter to start the authoritative cluster restore. Figure 1. Performing an authoritative restore of the cluster configuration.
6.	When the restore completes, each node in the cluster needs to have the cluster service started to complete the process. This might have been performed by the restore operation, but each node should be checked to verify that the cluster service is indeed started.
7.	Open the Failover Cluster Manager console to verify that the restore has completed successfully. Close the console and log off of the server when you are finished.

Deploying Multisite or Stretch Geographically Dispersed Failover Clusters

Geographically dispersed failover clusters are failover clusters that include cluster nodes deployed in multiple physical locations. The multisite or stretch term defines whether the two locations share a common network that is extended across the WAN, stretch, or multisite, in which cluster nodes are members of different Active Directory sites. By definition of an Active Directory site, these sites are defined by the different networks they reside on. Geographically dispersed failover clusters are not easy to deploy as each organization’s network configuration might require different tuning parameters within the failover cluster services and Applications group resource properties. Some special considerations for geoclusters are as follows:

Data replication is not performed by the cluster and must be performed using a third-party hardware or software solution.
If an even number of nodes will be deployed with an equal amount of nodes in each location and the Node and File Share Majority Quorum configuration is used, if the file share is hosted in either of the sites, and that site becomes inaccessible, the remote site will not be able to return to operation. In this case, it might be necessary to host the file share in another site to add some resilience to the multisite cluster.
If the failover cluster will span multiple subnets, how will the IP address resources be configured? You can create multiple IP address resources in the Services and Applications group, one for each network, but you will need to carefully define that each IP address can only run on nodes in the group that are in the respective subnet.
For multisite failover clusters with different IP address resources for each network, the Network Name resource dependency will need to be adjusted to allow for starting up when either of the IP address resources are online, but not both. In other words, all IP address resources should be added as dependencies of the Network Name resource but should be listed as OR dependencies, as shown in Figure 2.

Figure 2. Adjusting the Network Name resource dependencies for Services and Applications groups with multiple IP address resources.
DNS record registration settings might need to be adjusted, particularly for Services and Applications groups that contain Network Name resources with multiple IP addresses in different subnets. Changing the DNS record TTL settings that the Network Name resource will use when it performs dynamic registration can directly affect client communication after a failover. If the client cannot resolve the network name to the correct IP address, it does not matter if the failover cluster is online or not. These settings can be changed using the cluster.exe utility.
Cluster heartbeat communication settings might need to be adjusted based on the network usage and response. This would need to be determined by performing exhaustive testing during different network conditions to determine if the default heartbeat settings will be sufficient and will not unexpectedly determine that the nodes in a site are offline due to network latency. These settings can be changed using the cluster.exe utility.

Other -----------------

- BizTalk 2010 Recipes : Orchestrations - Catching Exceptions Consistently

- BizTalk 2010 Recipes : Orchestrations - Using Long-Running Transactions

- BizTalk 2010 Recipes : Orchestrations - Creating Atomic Scopes

- Windows Server 2003 : Deploying Security Templates

- SharePoint 2010 PerformancePoint Services : Securing a PerformancePoint Installation - Applying Security to Data Connectionslement

- SharePoint 2010 PerformancePoint Services : Securing a PerformancePoint Installation - Defining Permissions Specific to an Element

- SharePoint 2010 PerformancePoint Services : Securing a PerformancePoint Installation - Applying Security to PPS Elements

- Migrating from Active Directory 2000/2003 to Active Directory 2008 : Big Bang Migration

- Migrating from Active Directory 2000/2003 to Active Directory 2008 : Beginning the Migration Process

- Migrating from Active Directory 2000/2003 to Active Directory 2008 : Understanding the Benefits to Upgrading Active Directory

Windows Server 2008 R2 : Backing Up and Restoring Failover Clusters

Failover Cluster Node—Backup Best Practices

Restoring an Entire Cluster to a Previous State

Figure 1. Performing an authoritative restore of the cluster configuration.

Deploying Multisite or Stretch Geographically Dispersed Failover Clusters

Figure 2. Adjusting the Network Name resource dependencies for Services and Applications groups with multiple IP address resources.