Windows Server 2008 R2 high-availability and recovery features : Planning for Backups and Disaster Recovery

4/27/2014 2:33:58 AM

One of the most important and overlooked functions of administering a Windows network is planning for and implementing good backup and recovery solutions. Typically, administrators learn to implement good recovery processes and practices only after they have survived a disaster situation where they lost data that was critical to the company. If you want to be a good administrator, plan for worst case scenarios and hope for the best.

Disaster recovery planning

In most organizations, planning for disaster encompasses a lot more than just the IT department. How a company defines a disaster varies from one organization to another. A small business may consider a disaster the complete loss of the office building. A larger organization could consider the loss of a critical system as a disaster. It really depends on your recovery needs and how dependent your organization is on the systems that support it. The actual process of planning for big disaster will require involvement from various aspects of the company and will involve planning outside of the formal IT organization. For example, if the company loses an entire building because of fire, plans will need to be put into action that cover logistics around facilities, communications, emergency services, IT, etc. Do not make your disaster recovery an “IT Thing,” but make it a business thing. Work with various business units to determine which systems are critical to business processes and ensure that you have a good plan in place for those first. Work with the owner of each business application and determine a realistic option for the following:

Recovery Point Objective (RPO)—The RPO is the longest acceptable data loss expressed in time. For example, can your organization function if the given system loses the last 24 h of data?
Recovery Time Objective (RTO)—The RTO is the acceptable amount of downtime permitted for a given system. For example, it may only be acceptable that the organization have an email outage of 4 h in the event of a disaster. This means that email must be backed up with messages flowing within 4 h of the disaster.

As you build your disaster plan, document and test it. You do not want to guess at critical decisions in the time of crisis.

As you begin to build your disaster recovery plan, consider the various technologies that can be used to provide recovery from a disaster. These can include clustering technologies, offsite replication services, or traditional backups. For example, you might want to consider supporting a critical file server via a geo-cluster for automated failover in the event of a disaster. On the other hand, you may only want to perform tape backups of your print server as it may not be deemed critical. Again you need to thoroughly document each system and what technologies you can use for disaster recovery. Document the recovery process in such a way that another person could perform the recovery in the event that you are unable to do so.

After determining the method to use for recovery, implement and test it on a regular basis. Without regular testing, there is no guarantee that your recovery process will work as you expect in the time of a real disaster.

As part of your disaster recovery planning, you will need to plan for and implement a good backup strategy. Even with a good disaster recovery plan, you will find yourself needing to backup data, using traditional backup methods for data retention and worst case scenarios (disaster recovery failure).

Backups

Creating a good backup strategy could very well be one of the toughest aspects of your job as an administrator. This strategy should be an evolving process that is modified as necessary to support systems supporting your organization’s business functions. You will again want to have an understanding from the business perspective as to how important a given application is to your organization. It may be determined that a SQL server must be backed up every 4 h, yet an application server may need only a weekly backup.

Depending on the size of your organization and the number of servers you manage, you may need to consider an enterprise backup solution opposed to using the built-in Windows Server Backup. Microsoft offers its own version of an enterprise solution as part of the System Center suite of products. System Center Data Protection Manager (DPM) can be used to backup Windows servers as well as applications such as SQL, Exchange, and SharePoint servers.

Some common strategies used for backups include disk-based backup solutions also known as Disk-to-Disk-to-Tape (D2D2T). These solutions involve backing up data to disk drives allowing for quick recoveries. After a defined period of time, the backup is then moved from disk to tape where it can be taken offsite for long-term retention. Backups tend to be performed using one or a combination of several of the following backup types:

Full backup—This backup type creates a backup of all selected files and folders. When a full backup is complete, the data is marked as being backed up.
Incremental backup—An incremental backup on backs up changes since the last time a backup was completed. For example, if a backup was completed yesterday, and four files change during the day, an incremental backup will only backup those four files. A recovery would require restoring the full backup, and the incremental. If multiple incrementals are run in between full backups, all incremental backup sets will be required in addition to the full backup when restoring data.
Differential backup—A differential backup performs backups of only those files that have changed since the last full backup. Similar to an incremental backup, a differential only backs up changes; however, it backs up all changes since the last full. For example, if a full backup is run on Monday, and differentials are run on Tuesday, Wednesday, and Thursday, each differential will backup all changes that took place after Monday.
Synthetic full—A synthetic full backup creates a full backup from the most recent full backup plus subsequent incremental and/or differential backups. The resulting synthetic full backup is identical to what would have been created from a full backup without the need to transfer data from the client computer to the backup media. Synthetic full backups can greatly enhance restore processes, especially if a given full backup cycle contains many incremental backup sets.
Transaction log backup—Transaction log backups are used to rapidly backup logs used by transactional-based systems such as database servers. Since transaction logs are small, they can be backed up rapidly allowing of point-in-time copies of the data taken on a regular basis. For example, a full database backup can be taken at night with transaction log backups taken every 6 h. If the SQL server failed, the data could be restored to the point in time where the transaction log was backed up in the last 6 h.
Real-time data protection (RDP)—RDP constantly monitors the data for changes and backs up all changes as they are made. This provides for a restore of data within minutes of the time it was lost.

Best Practices

Store backup data offsite

As best practice, you should regularly store backup data in an offsite location. In the event of a disaster in which the primary datacenter is destroyed, you may need access to off-site backups.

Just as you did with other disaster recovery processes, you will want to test your restoring capabilities on a regular basis. Whether backups are part of a disaster recovery strategy for a system, or only used for long-term data retention and work case scenarios, they need to be documented, monitored, and tested.

Other -----------------

- Windows Server 2008 R2 high-availability and recovery features : Installing and Administering Network Load Balancing (part 2) - Creating a Network Load Balancing cluster

- Windows Server 2008 R2 high-availability and recovery features : Installing and Administering Network Load Balancing (part 1) - Adding Network Load Balancing feature

- Windows Server 2008 R2 high-availability and recovery features : Installing and Administering Failover Clustering (part 8) - Administering a Failover Cluster

- Windows Server 2008 R2 high-availability and recovery features : Installing and Administering Failover Clustering (part 7) - Create shared folder on cluster, Testing Failover of Cluster

- Windows Server 2008 R2 high-availability and recovery features : Installing and Administering Failover Clustering (part 6) - Add primary storage to cluster, Configure service or application

- Windows Server 2008 R2 high-availability and recovery features : Installing and Administering Failover Clustering (part 5) - Creating a new Failover Cluster

- Windows Server 2008 R2 high-availability and recovery features : Installing and Administering Failover Clustering (part 4) - Verifying cluster configuration using the Cluster Validation Wizard

- Windows Server 2008 R2 high-availability and recovery features : Installing and Administering Failover Clustering (part 3) - Connecting cluster nodes to shared storage

- Windows Server 2008 R2 high-availability and recovery features : Installing and Administering Failover Clustering (part 2) - Adding Failover Clustering feature

- Windows Server 2008 R2 high-availability and recovery features : Installing and Administering Failover Clustering (part 1) - Failover Clustering prerequisites