Recovering from a Disaster in an Exchange Server 2007 Environment : Identifying the Extent of the Problem & Preparing for a More Easily Recoverable Environment

3/12/2012 3:09:09 PM

Before attempting to perform a recovery, it is important to first determine the type and extent of the problem. If the problem is not properly identified, you run the risk of performing an incorrect action that could actually make the problem worse. Equally important is to choose the most appropriate solution available. For example, restoring an entire server when only a single database failed would impact users who otherwise could have continued to use Exchange, and it would take significantly longer than restoring just the necessary database. Even though both plans of action would fix the issue, one is much simpler with less impact than the other.

Database Improvements Minimizes Corruption

Exchange Server 2007 brings a new version of the JET database called Jet Blue, which includes error correcting codes (ECC) enhancements that help minimize the number of errors in the database. The ECC automatically identifies and fixes minor errors. The enhancements to Jet Blue also improve the indexing and search functions.

Mailbox Content Was Deleted, Use the Undelete Function of Exchange and Outlook

When information is deleted from a user’s mailbox, whether it is an email message, a calendar appointment, a contact, or a task, the information is not permanently deleted from the Exchange server. Deleted items go into the Deleted Items folder in the user’s Outlook mailbox. The information is actually retained on the Exchange server for 30 days after deletion, even when it is supposedly permanently deleted from the Deleted Items folder.

Note

The Mail Retention feature needs to be enabled on the Exchange server for Outlook information to be retained on the Exchange server.

With a little training and documentation, end users can recover their own deleted mail items with ease. To recover mailbox items that have been deleted within Outlook, do the following:

1.	Highlight the Deleted Items folder.
2.	Click Tools, Recover Deleted Items.
3.	In the Recover Deleted Items From – Deleted Items window, select the items that you want to restore.
4.	Click the Recover Selected Items button.

If the item was “Shift-deleted,” which bypasses the Deleted Items folder, the message is not lost. Follow these instructions to recover hard-deleted items:

1.	Click Start, Run, type `Regedt32.exe` in the Open text box, and then click OK.
2.	Browse to the following key in the Registry: HKEY_LOCAL_MACHINE\SOFTWARE\Microsoft\Exchange\Client\Options
3.	On the Edit menu, click Add Value, and then add the following Registry value: Value name: DumpsterAlwaysOn Data type: DWORD Value data: 1
4.	Quit Registry Editor.

With this key set, you can highlight any folder in Outlook and use the Recover Deleted Items tool.

Data Is Lost, Must Restore from Backup

If data is lost and the undelete function does not recover the information, the information might need to be restored from a backup. Depending on how much information was lost, this might involve a full recovery of the Exchange server from tape or snapshot, or it might involve restoring just a single mailbox, folder, or message. The key to restoring information is determining what needs to be restored. If just a single message needs to be restored, there is no reason to recover the entire server in production. In many cases, when full tape backups have been conducted of an Exchange server, a restore of the storage group containing the missing data can be performed via the recovery storage group and the missing content merged back into the production databases.

Data Is Okay, Server Just Doesn’t Come Up

The failure of a server does not necessarily mean that the data needs to be restored completely from tape. Often, a server goes down because of a failure with the power supplies, a motherboard failure or even a processor failure. In a situation where the hard drives on a dead server are still operational, the hard drives should be moved to an operational server or, at the very least, the data should be transferred to a different server. By preserving the data on the drives, an organization can minimize the need to perform more complicated data reconstruction from a tape restore, which could result in the loss of data from the time of the last backup. Restoring from tape should always be considered a final option.

Data Is Corrupt—Some Mailboxes Are Accessible, Some Are Not

Data corruption typically occurs on Exchange servers when the time period since the last database maintenance is too long or maintenance has been neglected altogether. Without periodic maintenance, the databases in Exchange are more susceptible to becoming corrupt. Exchange database corruption that is not repaired can make individual messages or entire portions of mailboxes stored on an Exchange server to become inaccessible.

When a mailbox or multiple mailboxes are corrupt, the good data in the mailboxes can be extracted with minimal data loss. By isolating the corruption and extracting good data, an organization that might not need to recover the lost data can typically continue to operate with minimal downtime.

Data Is Corrupt, No Mailboxes Are Accessible

Depending on the condition of an Exchange database, the information might be so corrupt that none of the mailboxes are accessible. Recovering data from a corrupt database that cannot be accessed is a two-step process. The first step is to conduct maintenance to attempt to repair the database; the second step is to extract as much information from the database as possible.

Exchange Server Is Okay, Something Else Is Preventing Exchange from Working

If you know that the Exchange server and databases are operational and something else is preventing Exchange from working, the process of recovery focuses on looking at things such as Active Directory, Internet Information Services (IIS), the domain name system (DNS), and the network infrastructure, as with site-to-site connectivity for replication.

Mail Is Not Flowing Between Sites

If users are able to access their mailboxes normally and mail can be sent between users of the same site, odds are the issue is with the Hub Transport server. In larger implementations of Exchange 2007, the Hub Transport server role is likely to be run on a system that doesn’t host mailboxes. Generally speaking, backups are not performed on a Hub Transport server as it contains no unique information. To restore these services, simply rebuild the Hub Transport server. Installing with a /recoverserver switch allows the server to recover its configuration from Active Directory, saving some configuration steps. This assumes the server is built with the same name.

If you need the transport services up very rapidly, consider adding the Hub Transport server role to an existing system. To add this role, follow these steps:

1.	From an existing Exchange 2007 server, open a command prompt.
2.	Navigate to Program Files, Microsoft, Exchange Server, bin.
3.	Type `exsetup.exe /mode:install /role:hub`.

Internet Mail Is Not Flowing

If you are unable to send mail to the Internet or receive mail from the Internet, there is a very good chance that the issue is a failure with the Edge Transport server. Most environments should run more than one Edge Transport server, preferably in different locations. But if an Edge Transport server fails, it should be rebuilt as they are typically not backed up. Installing with a /recoverserver switch allows the server to recover its configuration from Active Directory, saving some configuration steps. This assumes the server is built with the same name.

If you need the transport services up very rapidly, consider adding the Edge Transport server role to an existing system. To add this role, follow these steps:

1.	From an existing Exchange 2007 server, open a command prompt.
2.	Navigate to Program Files, Microsoft, Exchange Server, bin.
3.	Type `exsetup.exe /mode:install /role:et`.

Note

If you place the Edge Transport role on a new system, you need to make sure that incoming Simple Mail Transfer Protocol (SMTP) mail from the Internet reaches this system. This might involve a change in configuration of MX records, firewall rules, Network Address Translation (NAT), or your antispam/antivirus gateway. Be sure you understand the implications of putting the Edge Transport role on another system before attempting this fix.

What to Do Before Performing Any Server-Recovery Process

If a full server recovery will be performed, or if a number of different procedures will be taken to install service packs, patches, updates, or other server-recovery attempts as an attempt to recover the server, a full backup should be performed on the server.

At first, it might seem unnecessary to back up a server that isn’t working properly, but during the problem-solving and debugging process, it is quite possible for a server to end up in even worse shape after a few updates and fixes have been applied. The initial problem might have been that a single mailbox couldn’t be accessed, and after some problem-solving efforts, the entire server might be inaccessible. A backup provides a rollback to the point of the initial problem state. When making changes in an attempt to fix a server, you always want a way to roll back a change in case it turns out to make the situation worse. When the backup is complete, verify that the backup is valid, ensuring that no open files are skipped during the backup process or that, if the files are skipped, they are backed up in other open file backup processes. This way, you will always have the ability to return to your starting point in case you need to try a different method to fix the server.

Caution

When performing any recovery of an Exchange server or resource, be careful what you delete, modify, or change. As a rule of thumb, never delete objects that are known throughout the directory; otherwise, you will not be able to restore the object because of the uniqueness of each object. As an example, if you plan to restore an entire server from tape, you do not want to first delete the server and then add the server back during the restoration process. The restoration process requires the existence of the old server in the directory. Deleting the server object and then adding the object again later gives the object a completely different globally unique identifier (GUID). Even though you restore the entire Exchange server from tape, the ID of the server and all of the objects in the server will be different, making it more difficult to recover the server. Other replicable objects that should not be deleted include public folders, public folder trees, groups, and distribution lists.

Validating Backup Data and Procedures

Another very important task that should be done before doing any maintenance, service, or repairs on an Exchange server is to validate that a full backup exists on the server, test the condition of the backup, and then secure the backup so that it is safe. Far too many organizations proceed with risky recovery procedures, believing that they have a fallback position by restoring from tape, only to realize that the tape backup is corrupt or that a complete backup does not exist. Equally important is to be sure that the tape you might need is actually onsite. Many companies send tapes offsite for storage. If you are depending on a particular backup tape for your rollback, be sure it is readily accessible.

If the administrators of the network realize that there is no clean backup, the procedures taken to recover the system might be different than if a backup had existed. If a full backup exists and is verified to be in good condition, the organization has an opportunity to restore from tape if a full restore is necessary.

Preparing for a More Easily Recoverable Environment

Steps can be taken to help an organization more easily prepare for a recoverable environment. This involves documenting server states and conditions, performing specific backup procedures, and setting up new features in Exchange Server 2007 that provide for a more simplified restoration process. By maintaining these processes and performing regular test restores, a company can feel confident that they can quickly and easily recover from a disaster.

Documenting the Exchange Environment

Key to the success of recovering an Exchange server or an entire Exchange environment is having documentation on the server configurations. Having specific server configuration information documented helps to identify which server is not operational, the routing of information between servers, and, ultimately, the impact that a server failure or server recovery will have on the rest of the Exchange environment. By having a complete understanding of the Exchange environment as a whole, an administrator can often bring up temporary services to alleviate a failure and give themselves more time to fix the issue and determine the root cause.

Note

A utility called ExchDump can assist an administrator with baselining and improving the environment. Use ExchDump to export and document a server’s configuration. The ExchDump utility can be downloaded from the Microsoft Exchange download page at http://www.microsoft.com/exchange/downloads/2003/default.mspx.

Although this utility was originally written for Exchange Server 2003, it works fine for extracting the same information from an Exchange 2007 server.

Some of the items that should be documented include the following:

Server name
Server roles held
Version of Windows on servers (including service pack)
Version of Exchange on servers (including service pack)
Organization name in Exchange
Site names
Storage group names
Database names
Location of databases
Size of databases
When database maintenance was last run
Public folder tree name
Replication process of public folders
Security delegation and administrative rights
Names and locations of global catalog servers

Documenting the Backup Process

To simplify a restore of an Exchange environment, it is important to start with a clean backup. A clean backup is performed when the proper backup process is followed. Create a backup process that works, document the step-by-step procedures to back up the server, follow the procedures regularly, and then validate that the backups have been completed successfully.

Also, when configurations change, the backup process as well as system configurations should be documented and validated again, to make sure that the backups are being completed properly.

Documenting the Recovery Process

An important aspect of recovery feasibility is knowing how to recover from a disaster. Just knowing what to back up and what scenarios to plan for is not enough. Restore processes should be created and tested to ensure that a restore can meet service level agreements (SLAs) and that the staff members understand all the necessary steps.

When a process is determined, it should be documented, and the documentation should be written to make sense to the desired audience. For example, if a failure occurs in a satellite office that has only marketing employees and one of them is forced to recover a server, the documentation needs to be written so that it can be understood by just about anyone. If the information technology (IT) staff will be performing the restore, the documentation can be less detailed, but it assumes a certain level of knowledge and expertise with the server product. The first paragraph of any document related to backup and recovery should be a summary of what the document is used for and the level of skill necessary to perform the task and understand the document.

The recovery process involved in resolving an Exchange problem should also be focused not only on the goal of getting the entire Exchange server back up and operational, but also on considering smaller steps that might help minimize downtime. As an example, if an Exchange server has failed, instead of trying to restore 100GB of mail back to the server, which can take hours, if not days, to complete, an organization can choose to restore just the user Inboxes, calendars, and contacts. After a faster system recovery of core information on a server, the balance of the information can be restored over the next several hours.

The other advantage of having a properly documented restore procedure is that it greatly reduces the chances of human error occurring during a restore. Recovering a failed server while hundreds or possibly thousands of email users are affected is a stressful situation. This isn’t the time to learn how to perform a restore. The goal in this situation is for the administrator to be able to follow a clearly documented and well-tested process to ensure that no steps are missed and that no information is entered incorrectly. Having well-documented steps can greatly reduce the stress of this situation and increase the chances of a successful restore.

Including Test Restores in the Scheduled Maintenance

Part of a successful disaster recovery plan involves periodically testing the restore procedures to verify accuracy and to test the backup media to ensure that data can actually be recovered. Most organizations or administrators assume that if the backup software reports “Successful,” the backup is good and data can be recovered. If special backup consideration is not addressed, the successful backup might not contain everything necessary to restore a server if data loss or software corruption occurs.

Restores of file data, application data, and configurations should be performed as part of a regular maintenance schedule to ensure that the backup method is correct and that disaster recovery procedures and documentation are current. Such tests also should verify that the backup media can be read from and used to restore data. Adding periodic test restores to regular maintenance intervals ensures that backups are successful and familiarizes the administrators with the procedures necessary to recover so that when a real disaster occurs, the recovery can be performed correctly and efficiently the first time.

These test restores should occur in a lab environment where end users won’t be affected. The restores should vary in type, testing single mailbox restores, complete server restores, and full site restores where even domain controllers might need to be restored from scratch. This helps ensure that staff members are comfortable with the process and will have no problem performing a restore in production should the occasion ever arise.