One of the most important and overlooked
functions of administering a Windows network is planning for and
implementing good backup and recovery solutions. Typically,
administrators learn to implement good recovery processes and practices
only after they have survived a disaster situation where they lost data
that was critical to the company. If you want to be a good
administrator, plan for worst case scenarios and hope for the best.
Disaster recovery planning
In most organizations, planning for disaster
encompasses a lot more than just the IT department. How a company
defines a disaster varies from one organization to another. A small
business may consider a disaster the complete loss of the office
building. A larger organization could consider the loss of a critical
system as a disaster. It really depends on your recovery needs and how
dependent your organization is on the systems that support it. The
actual process of planning for big disaster will require involvement
from various aspects of the company and will involve planning outside
of the formal IT organization. For example, if the company loses an
entire building because of fire, plans will need to be put into action
that cover logistics around facilities, communications, emergency
services, IT, etc. Do not make your disaster recovery an “IT Thing,”
but make it a business thing. Work with various business units to
determine which systems are critical to business processes and ensure
that you have a good plan in place for those first. Work with the owner
of each business application and determine a realistic option for the
following:
Recovery Point Objective (RPO)—The RPO is
the longest acceptable data loss expressed in time. For example, can
your organization function if the given system loses the last 24 h of
data?
Recovery Time Objective (RTO)—The RTO is the acceptable amount of downtime permitted for a given system. For example, it
may only be acceptable that the organization have an email outage of 4
h in the event of a disaster. This means that email must be backed up
with messages flowing within 4 h of the disaster.
As you build your disaster plan, document and test it. You do not want to guess at critical decisions in the time of crisis.
As you begin to build your disaster recovery plan,
consider the various technologies that can be used to provide recovery
from a disaster. These can include clustering technologies, offsite
replication services, or traditional backups. For example, you might
want to consider supporting a critical file server via a geo-cluster
for automated failover in the event of a disaster. On the other hand,
you may only want to perform tape backups of your print server as it
may not be deemed critical. Again you need to thoroughly document each
system and what technologies you can use for disaster recovery.
Document the recovery process in such a way that another person could
perform the recovery in the event that you are unable to do so.
After determining the method to use for recovery,
implement and test it on a regular basis. Without regular testing,
there is no guarantee that your recovery process will work as you
expect in the time of a real disaster.
As part of your disaster recovery planning, you will
need to plan for and implement a good backup strategy. Even with a good
disaster recovery plan, you will find yourself needing to backup data,
using traditional backup methods for data retention and worst case
scenarios (disaster recovery failure).
Backups
Creating a good backup strategy could very well be
one of the toughest aspects of your job as an administrator. This
strategy should be an evolving process that is modified as necessary to
support systems supporting your organization’s business functions. You
will again want to have an understanding from the business perspective
as to how important a given application is to your organization. It may
be determined that a SQL server must be backed up every 4 h, yet an
application server may need only a weekly backup.
Depending on the size of your organization and the
number of servers you manage, you may need to consider an enterprise
backup solution opposed to using the built-in Windows Server Backup.
Microsoft offers its own version of an enterprise solution as part of
the System Center suite of products. System Center Data Protection
Manager (DPM) can be used to backup Windows servers as well as
applications such as SQL, Exchange, and SharePoint servers.
Some common strategies used for backups include
disk-based backup solutions also known as Disk-to-Disk-to-Tape (D2D2T).
These solutions involve backing up data to disk drives allowing for
quick recoveries. After a defined period of time, the backup is then
moved from disk to tape where it can be taken offsite for long-term
retention. Backups tend to be performed using one or a combination of
several of the following backup types:
Full backup—This backup type creates a
backup of all selected files and folders. When a full backup is
complete, the data is marked as being backed up.
Incremental
backup—An incremental backup on backs up changes since the last time a
backup was completed. For example, if a backup was completed yesterday,
and four files change during the day, an incremental backup will only
backup those four files. A recovery would require restoring the full
backup, and the incremental. If multiple incrementals are run in
between full backups, all incremental backup sets will be required in
addition to the full backup when restoring data.
Differential
backup—A differential backup performs backups of only those files that
have changed since the last full backup. Similar to an incremental
backup, a differential only backs up changes; however, it backs up all
changes since the last full. For example, if a full backup is run on
Monday, and differentials are run on Tuesday, Wednesday, and Thursday,
each differential will backup all changes that took place after Monday.
Synthetic
full—A synthetic full backup creates a full backup from the most recent
full backup plus subsequent incremental and/or differential backups.
The resulting synthetic full backup is identical to what would have
been created from a full backup without the need to transfer data from
the client computer to the backup media. Synthetic full backups can
greatly enhance restore processes, especially if a given full backup
cycle contains many incremental backup sets.
Transaction
log backup—Transaction log backups are used to rapidly backup logs used
by transactional-based systems such as database servers. Since
transaction logs are small, they can be backed up rapidly allowing of
point-in-time copies of the data taken on a regular basis. For
example, a full database backup can be taken at night with transaction
log backups taken every 6 h. If the SQL server failed, the data could
be restored to the point in time where the transaction log was backed
up in the last 6 h.
Real-time data
protection (RDP)—RDP constantly monitors the data for changes and backs
up all changes as they are made. This provides for a restore of data
within minutes of the time it was lost.
Store backup data offsite
As best practice, you should regularly store backup
data in an offsite location. In the event of a disaster in which the
primary datacenter is destroyed, you may need access to off-site
backups.
|
Just as you did with other disaster
recovery processes, you will want to test your restoring capabilities
on a regular basis. Whether backups are part of a disaster recovery
strategy for a system, or only used for long-term data retention and
work case scenarios, they need to be documented, monitored, and tested.